Networkshop 34 4-6 Apr 2006, R. Hughes-Jones Manchester 1 Bandwidth Challenges or "How fast can we...
-
date post
21-Dec-2015 -
Category
Documents
-
view
221 -
download
1
Transcript of Networkshop 34 4-6 Apr 2006, R. Hughes-Jones Manchester 1 Bandwidth Challenges or "How fast can we...
Networkshop 34 4-6 Apr 2006, R. Hughes-Jones Manchester1
Bandwidth Challenges or
"How fast can we really drive a Network?"
Richard Hughes-Jones The University of Manchester
www.hep.man.ac.uk/~rich/ then “Talks”
Networkshop 34 4-6 Apr 2006, R. Hughes-Jones Manchester2
SCINet
Caltech Booth The BWC at the SLAC Booth
Collaboration at SC|05
ESLEA Boston Ltd. & Peta-CacheSun
Storcloud
Networkshop 34 4-6 Apr 2006, R. Hughes-Jones Manchester3
Bandwidth Challenge wins Hat Trick The maximum aggregate bandwidth was >151 Gbits/s
130 DVD movies in a minute serve 10,000 MPEG2 HDTV movies
in real-time 22 10Gigabit Ethernet waves
Caltech & SLAC/FERMI booths In 2 hours transferred 95.37 TByte 24 hours moved ~ 475 TBytes
Showed real-time particle event analysis
SLAC Fermi UK Booth: 1 10 Gbit Ethernet to UK NLR&UKLight:
transatlantic HEP disk to diskVLBI streaming
2 10 Gbit Links to SALC:rootd low-latency file access
application for clusters Fibre Channel StorCloud
4 10 Gbit links to FermiDcache data transfers
SLAC-ESnet
FermiLab-HOPI
SLAC-ESnet-USNFNAL-UltraLight
UKLight
SLAC-ESnet
FermiLab-HOPI
SLAC-ESnet-USNFNAL-UltraLight
UKLight
SC2004 101 Gbit/s
In to booth
Out of booth
Networkshop 34 4-6 Apr 2006, R. Hughes-Jones Manchester4
ESLEA and UKLight
6 * 1 Gbit transatlantic Ethernet layer 2 paths UKLight + NLR
Disk-to-disk transfers with bbcp Seattle to UK Set TCP buffer and application
to give ~850Mbit/s One stream of data 840-620 Mbit/s
Stream UDP VLBI data UK to Seattle 620 Mbit/s
sc0502 SC|05
0
100
200
300
400
500
600
700
800
900
1000
16:00 17:00 18:00 19:00 20:00 21:00 22:00 23:00
date-time
Ra
te
M
bit/s
sc0503 SC|05
0
100
200
300
400
500
600
700
800
900
1000
16:00 17:00 18:00 19:00 20:00 21:00 22:00 23:00
date-time
Ra
te
M
bit/s
sc0504 SC|05
0
100
200
300
400
500
600
700
800
900
1000
16:00 17:00 18:00 19:00 20:00 21:00 22:00 23:00
date-time
Ra
te
M
bit/s
sc0501 SC|05
0
100
200
300
400
500
600
700
800
900
1000
16:00 17:00 18:00 19:00 20:00 21:00 22:00 23:00
time
Ra
te
M
bit/s
UKLight SC|05
0
500
1000
1500
2000
2500
3000
3500
4000
4500
16:00 17:00 18:00 19:00 20:00 21:00 22:00 23:00
date-time
Ra
te
Mb
it/s
Reverse TCP
Networkshop 34 4-6 Apr 2006, R. Hughes-Jones Manchester5
SLAC 10 Gigabit Ethernet 2 Lightpaths:
Routed over ESnet Layer 2 over Ultra Science Net
6 Sun V20Z systems per λ
dcache remote disk data access 100 processes per node Node sends or receives One data stream 20-30 Mbit/s
Used Netweion NICs & Chelsio TOE Data also sent to StorCloud
using fibre channel links
Traffic on the 10 GE link for 2 nodes: 3-4 Gbit per nodes 8.5-9Gbit on Trunk
Networkshop 34 4-6 Apr 2006, R. Hughes-Jones Manchester6
LightPath Topologies
Networkshop 34 4-6 Apr 2006, R. Hughes-Jones Manchester7
Switched LightPaths [1] Lightpaths are a fixed point to point path or circuit
Optical links (with FEC) have a BER 10-16 i.e. a packet loss rate 10-12 or 1 loss in about 160 days
In SJ5 LightPaths known as Bandwidth Channels Host to host Lightpath
One Application No congestion Advanced TCP stacks for large Delay Bandwidth Products
Lab to Lab Lightpaths Many application share Classic congestion points TCP stream sharing and recovery Advanced TCP stacks
Networkshop 34 4-6 Apr 2006, R. Hughes-Jones Manchester8
Switched LightPaths [2]
Some applications suffer when using TCP may prefer to use UDP DCCP XCP …
E.g. With e-VLBI the data wave-front gets distorted and correlation fails
User Controlled Lightpaths Grid Scheduling of
CPUs & Network Many Application flows No congestion on each path Lightweight framing possible
Networkshop 34 4-6 Apr 2006, R. Hughes-Jones Manchester9
Network Transport Layer Issues
Networkshop 34 4-6 Apr 2006, R. Hughes-Jones Manchester10
Problem #1
Packet Loss
Is it important ?
Networkshop 34 4-6 Apr 2006, R. Hughes-Jones Manchester11
0.00010.0010.010.1
110
1001000
10000100000
0 50 100 150 200rtt ms
Tim
e t
o r
eco
ver
sec
10Mbit100Mbit1Gbit10Gbit
TCP (Reno) – Packet loss and Time TCP takes packet loss as indication of congestion Time for TCP to recover its throughput from 1 lost 1500 byte packet given by:
for rtt of ~200 ms @ 1 Gbit/s:
MSS
RTTC
*2
* 2
UK 6 ms Europe 25 ms USA 150 ms1.6 s 26 s 28min
2 min
Networkshop 34 4-6 Apr 2006, R. Hughes-Jones Manchester12
Packet Loss and new TCP Stacks TCP Response Function
Throughput vs Loss Rate – further to right: faster recovery UKLight London-Chicago-London rtt 177 ms Drop Packets in Kernel 2.6.6 Kernel
Agreement withtheory good
Some new stacksgood at high loss rates
sculcc1-chi-2 iperf 13Jan05
1
10
100
1000
100100010000100000100000010000000100000000Packet drop rate 1 in n
TC
P A
chie
vable
thro
ughput
Mbit/
s
A0 1500
A1 HSTCP
A2 Scalable
A3 HTCP
A5 BICTCP
A8 Westwood
A7 Vegas
A0 Theory
Series10
Scalable Theory
sculcc1-chi-2 iperf 13Jan05
0
100
200
300
400
500
600
700
800
900
1000
100100010000100000100000010000000100000000Packet drop rate 1 in n
TC
P A
chie
vable
thro
ughput
Mbit/
s
A0 1500
A1 HSTCP
A2 Scalable
A3 HTCP
A5 BICTCP
A8 Westwood
A7 Vegas
Networkshop 34 4-6 Apr 2006, R. Hughes-Jones Manchester13
High Throughput DemonstrationsGeneva rtt 128 ms
man03lon01
2.5 Gbit SDHMB-NG Core
1 GEth1 GEth
Cisco GSRCisco GSRCisco7609
Cisco7609
Chicago
Dual Zeon 2.2 GHz Dual Zeon 2.2 GHz
Send data with TCPDrop Packets
Monitor TCP with Web100
Networkshop 34 4-6 Apr 2006, R. Hughes-Jones Manchester14
TCP Throughput – DataTAG Different TCP stacks tested on the DataTAG Network rtt 128 ms Drop 1 in 106
High-SpeedRapid recovery
ScalableVery fast recovery
StandardRecovery would
take ~ 20 mins
Networkshop 34 4-6 Apr 2006, R. Hughes-Jones Manchester15
Problem #2
Is TCP fair?
look at
Round Trip Times & Max Transfer Unit
Networkshop 34 4-6 Apr 2006, R. Hughes-Jones Manchester16
MTU and Fairness
Two TCP streams share a 1 Gb/s bottleneck RTT=117 ms MTU = 3000 Bytes ; Avg. throughput over a period of 7000s = 243 Mb/s MTU = 9000 Bytes; Avg. throughput over a period of 7000s = 464 Mb/s Link utilization : 70,7 %
Starlight (Chi)Starlight (Chi)CERN (GVA)CERN (GVA)
RR RRGbE GbE SwitchSwitch
Host #1Host #1POS 2.5POS 2.5 GbpsGbps1 GE1 GE
1 GE1 GE
Host #2Host #2
Host #1Host #1
Host #2Host #2
1 GE1 GE
1 GE1 GE
BottleneckBottleneck
Throughput of two streams with different MTU sizes sharing a 1 Gbps bottleneck
0
100
200
300
400
500
600
700
800
900
1000
0 1000 2000 3000 4000 5000 6000
Time(s)
Thro
ughput
(Mbps)
MTU = 3000 Byte
Average over the life of the connection MTU = 3000 Byte
MTU = 9000 Byte
Average over the life of the connection MTU = 9000 Byte
Sylvain Ravot DataTag 2003
Networkshop 34 4-6 Apr 2006, R. Hughes-Jones Manchester17
RTT and Fairness
SunnyvaleSunnyvaleStarlight (Chi)Starlight (Chi)
CERN (GVA)CERN (GVA)
RR RRGbE GbE SwitchSwitch
Host #1Host #1
POS 2.5POS 2.5 Gb/sGb/s1 GE1 GE
1 GE1 GE
Host #2Host #2
Host #1Host #1
Host #2Host #2
1 GE1 GE
1 GE1 GE
BottleneckBottleneck
RRPOS 10POS 10 Gb/sGb/sRR10GE10GE
Two TCP streams share a 1 Gb/s bottleneck CERN <-> Sunnyvale RTT=181ms ; Avg. throughput over a period of 7000s = 202Mb/s CERN <-> Starlight RTT=117ms; Avg. throughput over a period of 7000s = 514Mb/s MTU = 9000 bytes Link utilization = 71,6 %
Throughput of two streams with different RTT sharing a 1Gbps bottleneck
0
100
200
300
400
500
600
700
800
900
1000
0 1000 2000 3000 4000 5000 6000 7000
Time (s)
Thro
ughput
(Mbps)
RTT=181ms
Average over the life of the connection RTT=181ms
RTT=117ms
Average over the life of the connection RTT=117ms
Sylvain Ravot DataTag 2003
Networkshop 34 4-6 Apr 2006, R. Hughes-Jones Manchester18
Problem #n
Do TCP Flows Share the Bandwidth ?
Networkshop 34 4-6 Apr 2006, R. Hughes-Jones Manchester19
Chose 3 paths from SLAC (California) Caltech (10ms), Univ Florida (80ms), CERN (180ms)
Used iperf/TCP and UDT/UDP to generate traffic
Each run was 16 minutes, in 7 regions
Test of TCP Sharing: Methodology (1Gbit/s)
Ping 1/s
Iperf or UDT
ICMP/ping traffic
TCP/UDPbottleneck
iperf
SLACCERN
2 mins 4 mins Les Cottrell & RHJ PFLDnet 2005
Networkshop 34 4-6 Apr 2006, R. Hughes-Jones Manchester20
Low performance on fast long distance paths AIMD (add a=1 pkt to cwnd / RTT, decrease cwnd by factor b=0.5 in congestion) Net effect: recovers slowly, does not effectively use available bandwidth, so poor
throughput Unequal sharing
TCP Reno single stream
Congestion has a dramatic effect
Recovery is slow
Increase recovery rate
SLAC to CERN
RTT increases when achieves best throughput
Les Cottrell & RHJ PFLDnet 2005
Remaining flows do not take up slack when flow removed
Networkshop 34 4-6 Apr 2006, R. Hughes-Jones Manchester21
Hamilton TCP One of the best performers
Throughput is high Big effects on RTT when achieves best throughput Flows share equally
Appears to need >1 flow toachieve best throughput
Two flows share equally
SLAC-CERN
> 2 flows appears less stable
Networkshop 34 4-6 Apr 2006, R. Hughes-Jones Manchester22
Problem #n+1
To SACK or not to SACK ?
Networkshop 34 4-6 Apr 2006, R. Hughes-Jones Manchester23
The SACK Algorithm SACK Rational Non-continuous blocks of data can be ACKed Sender transmits just lost packets Helps when multiple packets lost in one TCP window
The SACK Processing is inefficient for large bandwidth delay products Sender write queue (linked list) walked for:
Each SACK blockTo mark lost packetsTo re-transmit
Processing so long input Q becomes full Get Timeouts
SACKs updated rtt 150msStandard SACKs rtt 150ms
HS-TCP
Dell 1650 2.8 GHz
PCI-X 133 MHz
Intel Pro/1000
Doug Leith
Yee-Ting Li
Networkshop 34 4-6 Apr 2006, R. Hughes-Jones Manchester24
What does the User & Application make of this?
The view from the Application
Networkshop 34 4-6 Apr 2006, R. Hughes-Jones Manchester25
SC2004: Disk-Disk bbftp bbftp file transfer program uses TCP/IP UKLight: Path:- London-Chicago-London; PCs:- Supermicro +3Ware RAID0 MTU 1500 bytes; Socket size 22 Mbytes; rtt 177ms; SACK off Move a 2 Gbyte file Web100 plots:
Standard TCP Average 825 Mbit/s (bbcp: 670 Mbit/s)
Scalable TCP Average 875 Mbit/s (bbcp: 701 Mbit/s
~4.5s of overhead)
Disk-TCP-Disk at 1Gbit/sis here!
0
500
1000
1500
2000
2500
0 5000 10000 15000 20000
time msT
CP
Ach
ive M
bit
/s
050000001000000015000000200000002500000030000000350000004000000045000000
Cw
nd
InstaneousBW
AveBW
CurCwnd (Value)
0
500
1000
1500
2000
2500
0 5000 10000 15000 20000
time ms
TC
PA
ch
ive M
bit
/s
050000001000000015000000200000002500000030000000350000004000000045000000
Cw
nd
InstaneousBWAveBWCurCwnd (Value)
Networkshop 34 4-6 Apr 2006, R. Hughes-Jones Manchester26
SC|05 HEP: Moving data with bbcp What is the end-host doing with your network protocol? Look at the PCI-X 3Ware 9000 controller RAID0 1 Gbit Ethernet link 2.4 GHz dual Xeon ~660 Mbit/s
PCI-X bus with RAID Controller
PCI-X bus with Ethernet NIC
Read from diskfor 44 ms every 100ms
Write to Networkfor 72 ms
Power needed in the end hosts Careful Application design
Networkshop 34 4-6 Apr 2006, R. Hughes-Jones Manchester27
VLBI: TCP Stack & CPU Load Real User problem! End host TCP flow at 960 Mbit/s with rtt 1 ms falls to 770 Mbit/s when rtt 15 ms
mk5-606-g7_10Dec05
0.0010.0020.0030.0040.0050.0060.0070.0080.0090.00
100.00
0 2 4 6 8 10 12 14 16 18 20nice large value - low priority
% C
PU
mo
de
se
nd
kernel
user
nice
idle
no CPU load
0
200
400
600
800
1000
0 2 4 6 8 10 12 14 16 18 20nice large value - low priority
Thro
ughput
Mbit/s
no CPU load
1.2GHz PIII TCP iperf rtt 1 ms 960 Mbit/s
94.7% kernel mode idle 1.5 % TCP iperf rtt 15 ms 777 Mbit/s
96.3% kernel mode idle 0.05 % CPULoad with nice priority
Throughput falls as priorityincreases
No Loss No Timeouts
Not enough CPU powermk5-606-g7_17Jan05
0.0010.0020.0030.0040.0050.0060.0070.0080.0090.00
100.00
0 2 4 6 8 10 12 14 16 18 20nice large value - low priority
% C
PU
mo
de
se
nd
kernel
user
nice
idle
no CPU load
0
200
400
600
800
1000
0 2 4 6 8 10 12 14 16 18 20nice large value - low priority
Thro
ughput
Mbit/s
no CPU load
2.8 GHz Xeon rtt 1 ms TCP iperf 916 Mbit/s
43% kernel mode idle 55% CPULoad with nice priority
Throughput constant as priority increases
No Loss No Timeouts
Kernel mode includes TCP stackand Ethernet driver
Networkshop 34 4-6 Apr 2006, R. Hughes-Jones Manchester28
ATLAS Remote Computing: Application Protocol
Event Request EFD requests an event from SFI SFI replies with the event ~2Mbytes
Processing of event Return of computation
EF asks SFO for buffer space SFO sends OK EF transfers results of the computation
tcpmon - instrumented TCP request-response program emulates the Event Filter EFD to SFI communication.
Send OK
Send event data
Request event
●●●
Request Buffer
Send processed event
Process event
Time
Request-Response time (Histogram)
Event Filter Daemon EFD SFI and SFO
Networkshop 34 4-6 Apr 2006, R. Hughes-Jones Manchester29
tcpmon: TCP Activity Manc-CERN Req-Resp
0
50000
100000
150000
200000
250000
0 200 400 600 800 1000 1200 1400 1600 1800 2000time
Data
Byte
s O
ut
0
50
100
150
200
250
300
350
400
Data
Byte
s I
n
DataBytesOut (Delta DataBytesIn (Delta Web100 hooks for TCP status
Round trip time 20 ms 64 byte Request green
1 Mbyte Response blue TCP in slow start 1st event takes 19 rtt or ~ 380 ms
0
50000
100000
150000
200000
250000
0 200 400 600 800 1000 1200 1400 1600 1800 2000time ms
Data
Byte
s O
ut
0
50000
100000
150000
200000
250000
Cu
rCw
nd
DataBytesOut (Delta DataBytesIn (Delta CurCwnd (Value
TCP Congestion windowgets re-set on each Request
TCP stack RFC 2581 & RFC 2861 reduction of Cwnd after inactivity
Even after 10s, each response takes 13 rtt or ~260 ms
020406080
100120140160180
0 200 400 600 800 1000 1200 1400 1600 1800 2000time ms
TC
PA
ch
ive M
bit
/s
0
50000
100000
150000
200000
250000
Cw
nd
Transfer achievable throughput120 Mbit/s
Event rate very low Application not happy!
Networkshop 34 4-6 Apr 2006, R. Hughes-Jones Manchester30
tcpmon: TCP Activity Manc-cern Req-Respno cwnd reduction
Round trip time 20 ms 64 byte Request green
1 Mbyte Response blue TCP starts in slow start 1st event takes 19 rtt or ~ 380 ms
0
200000
400000
600000
800000
1000000
1200000
0 500 1000 1500 2000 2500 3000time
Da
ta B
yte
s O
ut
0
50
100
150
200
250
300
350
400
Data
Byte
s I
n
DataBytesOut (Delta DataBytesIn (Delta
0100200300400
500600700800900
0 1000 2000 3000 4000 5000 6000 7000 8000time ms
TC
PA
ch
ive M
bit
/s
0
200000
400000
600000
800000
1000000
1200000
Cw
nd
0
100
200
300
400
500
600
700
800
0 500 1000 1500 2000 2500 3000time ms
nu
m P
ackets
0
200000
400000
600000
800000
1000000
1200000
Cw
nd
PktsOut (Delta PktsIn (Delta CurCwnd (Value
TCP Congestion windowgrows nicely
Response takes 2 rtt after ~1.5s Rate ~10/s (with 50ms wait)
Transfer achievable throughputgrows to 800 Mbit/s
Data transferred WHEN theapplication requires the data
3 Round Trips 2 Round Trips
Networkshop 34 4-6 Apr 2006, R. Hughes-Jones Manchester31
Objective: demo 1 Gbit/s aggregate bandwidth between RAL and 4 Tier 2 sites RAL has SuperJANET4 and UKLight links: RAL Capped firewall traffic at 800 Mbit/s
SuperJANET Sites: Glasgow Manchester Oxford QMWL
UKLight Site: Lancaster
Many concurrent transfersfrom RAL to each of the Tier 2 sites
HEP: Service Challenge 4
~700 Mbit UKLight
Peak 680 Mbit SJ4
5510 +5530
5530
RouterA
UKLightRouter
3 x 5510+ 5530
5510-3stack
ADS Caches
CPUs +Disks
CPUs +Disks
CPUs +Disks
CPU +Disks
CPUs +Disks
10Gb/ s
4 x1Gb/ s
10Gb/ s
4 x 1Gb/ sto CERN
1Gb/ sto Lancaster
N x 1Gb/ s
N x 1Gb/ s
FW
1Gb/ s 1Gb/ s to SJ 4
RALSite
2 x 1Gb/ s
Tier 1
RALTier 2
10Gb/ s
CPU +Disks
5510-2stack
OracleRACs
5510 +5530
5530
RouterA
UKLightRouter
3 x 5510+ 5530
5510-3stack
ADS Caches
CPUs +Disks
CPUs +Disks
CPUs +Disks
CPU +Disks
CPUs +Disks
10Gb/ s
4 x1Gb/ s
10Gb/ s
4 x 1Gb/ sto CERN
1Gb/ sto Lancaster
N x 1Gb/ s
N x 1Gb/ s
FW
1Gb/ s 1Gb/ s to SJ 4
RALSite
2 x 1Gb/ s
Tier 1
RALTier 2
10Gb/ s
CPU +Disks
5510-2stack
OracleRACs
Applications able to sustain high rates.
SuperJANET5, UKLight &new access links very timely
Networkshop 34 4-6 Apr 2006, R. Hughes-Jones Manchester32
Well, you CAN fill the Links at 1 and 10 Gbit/s – but its not THAT simple
Packet loss is a killer for TCP Check on campus links & equipment, and access links to backbones Users need to collaborate with the Campus Network Teams Dante Pert
New stacks are stable and give better response & performance Still need to set the TCP buffer sizes ! Check other kernel settings e.g. window-scale maximum Watch for “TCP Stack implementation Enhancements”
TCP tries to be fair Large MTU has an advantage Short distances, small RTT, have an advantage
TCP does not share bandwidth well with other streams
The End Hosts themselves Plenty of CPU power is required for the TCP/IP stack as well and the application Packets can be lost in the IP stack due to lack of processing power / CPU scheduling Interaction between HW, protocol processing, and disk sub-system complex
Application architecture & implementation are also important The TCP protocol dynamics strongly influence the behaviour of the Application.
Summary & Conclusions
Networkshop 34 4-6 Apr 2006, R. Hughes-Jones Manchester33
More Information Some URLs UKLight web site: http://www.uklight.ac.uk ESLEA web site: http://www.eslea.uklight.ac.uk MB-NG project web site: http://www.mb-ng.net/ DataTAG project web site: http://www.datatag.org/ UDPmon / TCPmon kit + writeup:
http://www.hep.man.ac.uk/~rich/net Motherboard and NIC Tests:
http://www.hep.man.ac.uk/~rich/net/nic/GigEth_tests_Boston.ppt& http://datatag.web.cern.ch/datatag/pfldnet2003/ “Performance of 1 and 10 Gigabit Ethernet Cards with Server Quality Motherboards” FGCS Special issue 2004 http:// www.hep.man.ac.uk/~rich/
TCP tuning information may be found at:http://www.ncne.nlanr.net/documentation/faq/performance.html & http://www.psc.edu/networking/perf_tune.html
TCP stack comparisons:“Evaluation of Advanced TCP Stacks on Fast Long-Distance Production Networks” Journal of Grid Computing 2004
PFLDnet http://www.ens-lyon.fr/LIP/RESO/pfldnet2005/ Dante PERT http://www.geant2.net/server/show/nav.00d00h002
Networkshop 34 4-6 Apr 2006, R. Hughes-Jones Manchester34
Any Questions?
Networkshop 34 4-6 Apr 2006, R. Hughes-Jones Manchester35
Backup Slides
Networkshop 34 4-6 Apr 2006, R. Hughes-Jones Manchester36
Lectures, tutorials etc. on TCP/IP: www.nv.cc.va.us/home/joney/tcp_ip.htm www.cs.pdx.edu/~jrb/tcpip.lectures.html www.raleigh.ibm.com/cgi-bin/bookmgr/BOOKS/EZ306200/CCONTENTS www.cisco.com/univercd/cc/td/doc/product/iaabu/centri4/user/scf4ap1.htm www.cis.ohio-state.edu/htbin/rfc/rfc1180.html www.jbmelectronics.com/tcp.htm
Encylopaedia http://www.freesoft.org/CIE/index.htm
TCP/IP Resources www.private.org.il/tcpip_rl.html
Understanding IP addresses http://www.3com.com/solutions/en_US/ncs/501302.html
Configuring TCP (RFC 1122) ftp://nic.merit.edu/internet/documents/rfc/rfc1122.txt
Assigned protocols, ports etc (RFC 1010) http://www.es.net/pub/rfcs/rfc1010.txt & /etc/protocols
More Information Some URLs 2
Networkshop 34 4-6 Apr 2006, R. Hughes-Jones Manchester37
Packet Loss with new TCP Stacks TCP Response Function
Throughput vs Loss Rate – further to right: faster recovery Drop packets in kernel
MB-NG rtt 6ms DataTAG rtt 120 ms
Networkshop 34 4-6 Apr 2006, R. Hughes-Jones Manchester38
Drop 1 in 25,000 rtt 6.2 ms Recover in 1.6 s
High Performance TCP – MB-NG
Standard HighSpeed Scalable
Networkshop 34 4-6 Apr 2006, R. Hughes-Jones Manchester39
Fast
As well as packet loss, FAST uses RTT to detect congestion RTT is very stable: σ(RTT) ~ 9ms vs 37±0.14ms for the others
SLAC-CERN
Big drops in throughput which take several seconds to recover from
2nd flow never gets equal share of bandwidth
Networkshop 34 4-6 Apr 2006, R. Hughes-Jones Manchester40
SACK …
Look into what’s happening at the algorithmic level with web100:
Strange hiccups in cwnd only correlation is SACK arrivals
Scalable TCP on MB-NG with 200mbit/sec CBR Background
Yee-Ting Li
Networkshop 34 4-6 Apr 2006, R. Hughes-Jones Manchester41
10 Gigabit Ethernet: Tuning PCI-X 16080 byte packets every 200 µs Intel PRO/10GbE LR Adapter PCI-X bus occupancy vs mmrbc
Measured times Times based on PCI-X times from
the logic analyser Expected throughput ~7 Gbit/s Measured 5.7 Gbit/s
0
5
10
15
20
25
30
35
40
45
50
0 1000 2000 3000 4000 5000Max Memory Read Byte Count
PC
I-X
Tra
nsfe
r tim
e u
s
0
1
2
3
4
5
6
7
8
9
PC
I-X
Tra
nsfe
r ra
te G
bit/s
Measured PCI-X transfer time usexpected time usrate from expected time Gbit/s Max throughput PCI-X
Kernel 2.6.1#17 HP Itanium Intel10GE Feb04
0
2
4
6
8
10
0 1000 2000 3000 4000 5000Max Memory Read Byte Count
PC
I-X
Tra
nsfe
r tim
e
us
measured Rate Gbit/srate from expected time Gbit/s Max throughput PCI-X
mmrbc1024 bytes
mmrbc2048 bytes
mmrbc4096 bytes5.7Gbit/s
mmrbc512 bytes
CSR Access
PCI-X Sequence
Data Transfer
Interrupt & CSR Update