4. Behind Traditional TCP: protocols for high-throughput and high … · 2014. 11. 11. · Lecture...
Transcript of 4. Behind Traditional TCP: protocols for high-throughput and high … · 2014. 11. 11. · Lecture...
-
4. Behind Traditional TCP:protocols for high-throughput and high-latency networks
PA191: Advanced Computer Networking
Eva Hladká
Slides by: Petr Holub, Tomáš Rebok
Faculty of Informatics Masaryk University
Autumn 2014
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 1 / 66
-
Lecture overview
1 Traditional TCP and its issues2 Improving the traditional TCP
Multi-stream TCPWeb100
3 Conservative Extensions to TCPGridDTScalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP
4 TCP Extensions with IP SupportQuickStart, E-TCP, FAST
5 Approaches Different from TCPtsunamiRBUDPXCPSCTP, DCCP, STP, Reliable UDP, XTP
6 Conclusions7 Literature
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 2 / 66
-
Traditional TCP
Lecture overview
1 Traditional TCP and its issues2 Improving the traditional TCP
Multi-stream TCPWeb100
3 Conservative Extensions to TCPGridDTScalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP
4 TCP Extensions with IP SupportQuickStart, E-TCP, FAST
5 Approaches Different from TCPtsunamiRBUDPXCPSCTP, DCCP, STP, Reliable UDP, XTP
6 Conclusions7 Literature
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 3 / 66
-
Traditional TCP
Protocols for reliable data transmission
Protocols for reliable data transmission have to:
ensure the reliability of the transfer
retransmissions of lost packetsFEC might be usefully employed
a protection from congestion
network, receiver
Behavior evaluation:
aggressiveness – ability to utilise available bandwidth
responsiveness – ability to recover from a packet loss
fairness – getting a fair portion of network throughput when morestreams/participants use the network
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 4 / 66
-
Traditional TCP
Problem statement
network links with high capacity and high latency
iGrid 2005: San Diego ↔ Brno, RTT = 205 msSC|05: Seattle ↔ Brno, RTT = 174 ms
traditional TCP is not suitable for such an environment:
10 Gb/s, RTT = 100 ms, 1500B MTU=⇒ sending/outstanding window 83.333 packets=⇒ a single packet may be lost in at most 1:36 hour
1 terribly slow2 if errors are more frequent, the maximum throughput cannot be reached
How could be a better network utilization achieved?
How could be a reasonable co-existence with traditional TCP ensured?
How could be a gradual deployment of a new protocol ensured?
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 5 / 66
-
Traditional TCP
Traditional TCP I.
flow control vs. congestion control
no control:
sender network receiver
flow control (rwnd):
sender network receiver
congestion control (cwnd):
sender network receiver
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 6 / 66
-
Traditional TCP
Traditional TCP I.
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 7 / 66
-
Traditional TCP
Traditional TCP II.
Flow control
an explicit feedback from receiver(s) using rwnddeterministic
Congestion control
an approximate sender’s estimation of available throughput (usingcwnd)
the final window used: ownd
ownd = min{rwnd , cwnd}
The bandwidth bw could be computed as:
bw =8 ·MSS · ownd
RTT(1)
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 8 / 66
-
Traditional TCP
Traditional TCP II.Flow Control
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 9 / 66
-
Traditional TCP
Traditional TCP – Tahoe and Reno
Congestion control:traditionally based on AIMD – Additive Increase Multiplicative Decrease approach
Tahoe [1]cwnd = cwnd + 1. . . per RTT without loss (above sstresh)sstresh = 0, 5cwndcwnd = 1. . . per every loss
Reno [2] addsfast retransmission
a TCP receiver sends an immediate duplicate ACK when an out-of-ordersegment arrives
all segments after the dropped one trigger duplicate ACKsa loss is indicated by 3 duplicate ACKs (≈ four successive identical ACKswithout intervening packets)
once received, TCP performs a fast retransmission without waiting forthe retransmission timer to expire
fast recovery – slow-start phase not used any moresstresh = cwnd = 0, 5cwndEva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 10 / 66
-
Traditional TCP
Traditional TCP – Tahoe I.
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 11 / 66
-
Traditional TCP
Traditional TCP – Tahoe II.
0 10 20 30 40 50
time [RTT]
0
5
10
15
20
25
30[M
SS
]
cwnd sstresh
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 12 / 66
-
Traditional TCP
Traditional TCP – Reno I.
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 13 / 66
-
Traditional TCP
Traditional TCP – Reno II.
0 10 20 30 40 50
time [RTT]
0
5
10
15
20
25
30[M
SS
]
cwnd sstresh
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 14 / 66
-
Traditional TCP
TCP Vegas
Vegas—a concept of congestion control [3]
when a network is congested, the RTT becomes higherRTT is monitored during the transmissionwhen a RTT increase is detected, the congestion’s window size islinearly reduced
a possibility to measure an available network bandwidth usinginter-packet spacing/dispersion
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 15 / 66
-
Traditional TCP
Traditional TCP
a reaction to packet loss—retransmission
Tahoe: the whole actual window owndReno: a single segment in the Fast Retransmission modeNewReno: more segments in the Fast Retransmission modeSelective Acknowledgement (SACK): just the lost packets
fundamental question:How could be a sufficient size of cwnd (under real conditions)achieved in the network having high capacity and high RTT?. . . without affecting/disallowing the “common” users from using thenetwork?
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 16 / 66
-
Traditional TCP
Traditional TCP – Response Function
Response Function represents a relation between bw and asteady-state packet loss rate p
cwndaverage ≈ 1,2√p (for MSS-sized segments)using (1): bw ≈ 9,6 MSSRTT√p
the responsiveness of traditional TCP
assuming, that the packet has been lost when cwnd = bw · RTT
% =bw RTT2
2MSS
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 17 / 66
-
Traditional TCP
Traditional TCP – Responsiveness
PSaAP II2004-03-19
12/52
Traditional TCP responsivness
TCP responsiveness
02000400060008000
1000012000140001600018000
0 50 100 150 200
RTT (ms)
Tim
e (s
) C= 622 Mbit/sC= 2.5 Gbit/sC= 10 Gbit/s
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 18 / 66
-
Traditional TCP
Traditional TCP – Fairness I.
a fairness in a point of equilibrium
the fairness is considered for
streams with different RTTstreams with different MTU
The speed of convergence to the point of equilibrium DOES matter!
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 19 / 66
-
Traditional TCP
Traditional TCP – Fairness II.
cwnd += MSS, cwnd ∗= 0, 5 (30 steps)
00
20
20
40
40
60
60
80
80
100
100
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 20 / 66
-
Traditional TCP
Traditional TCP – Fairness III.
cwnd += MSS, cwnd ∗= 0, 83 (30 steps)
00
20
20
40
40
60
60
80
80
100
100
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 21 / 66
-
Improving the traditional TCP Multi-stream TCP
Lecture overview
1 Traditional TCP and its issues2 Improving the traditional TCP
Multi-stream TCPWeb100
3 Conservative Extensions to TCPGridDTScalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP
4 TCP Extensions with IP SupportQuickStart, E-TCP, FAST
5 Approaches Different from TCPtsunamiRBUDPXCPSCTP, DCCP, STP, Reliable UDP, XTP
6 Conclusions7 Literature
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 22 / 66
-
Improving the traditional TCP Multi-stream TCP
Multi-stream TCP
assumes multiple TCP streams transferring a single data flow
in fact, improves the TCP’s performace/behavior just in cases ofisolated packet losses
a loss of more packets usually affects more TCP streams
usually available because of a simple implementation
bbftp, GridFTP, Internet Backplane Protocol, . . .
drawbacks:
more complicated than traditional TCP (more threads are necessary)the startup is accelerated linearly onlyleads to a synchronous overloading of queues and caches in the routers
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 23 / 66
-
Improving the traditional TCP Multi-stream TCP
TCP implementation tuning I.
cooperation with HW
Rx/Tx TCP Checksum Offloadingordinarily available
zero copy
accessing the network usually leads to several data copies:user-land ↔ kernel ↔ network cardpage flipping – user-land ↔ kernel data movement
support for, e.g., sendfile()
implementations for Linux, FreeBSD, Solaris, . . .
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 24 / 66
sendfile()
-
Improving the traditional TCP Web100
TCP implementation tuning II.
Web100 [4, 5]a software that implements instruments in the Linux TCP/IP stack –TCP Kernel Instrumentation Set (TCP-KIS)
more than 125 “puls/rods”information available via /proc
distributed in two pieces:
a kernel patch adding the instrumentsa suite of “userland” libraries and tools for accessing the kernelinstrumentation (command-line, GUI)
the Web100 software allows:
monitoring (extended statistics)instruments’ tuningsupport for auto-tuning
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 25 / 66
-
Conservative Extensions to TCP GridDT
Lecture overview
1 Traditional TCP and its issues2 Improving the traditional TCP
Multi-stream TCPWeb100
3 Conservative Extensions to TCPGridDTScalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP
4 TCP Extensions with IP SupportQuickStart, E-TCP, FAST
5 Approaches Different from TCPtsunamiRBUDPXCPSCTP, DCCP, STP, Reliable UDP, XTP
6 Conclusions7 Literature
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 26 / 66
-
Conservative Extensions to TCP GridDT
GridDT
a collection of ad-hoc modifications :(
correction of sstresh
faster slowstart
AIMD’s modification for congestion control:
cwnd = cwnd + a. . . per RTT without packet losscwnd = b cwnd. . . per packet loss
just the sender’s side has to be modified
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 27 / 66
-
Conservative Extensions to TCP GridDT
GridDT – fairness
00
20
20
40
40
60
60
80
80
100
100
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 28 / 66
-
Conservative Extensions to TCP GridDT
GridDT – example
PSaAP II2004-03-19
24/52
GridDT example
SunnyvaleSunnyvale
Starlight (CHI)Starlight (CHI)
CERN (GVA)CERN (GVA)
RR RRGbEGbESwitchSwitch POS 2.5POS 2.5 Gb/sGb/s1 GE1 GE
1 GE1 GEHost #2Host #2
Host #1Host #1
Host #2Host #2
1 GE1 GE
1 GE1 GEBottleneckBottleneck
RRPOS 10POS 10 Gb/sGb/sRR10GE10GE
Host #1Host #1
TCP Reno performance (see slide #8): First stream GVA Sunnyvale : RTT = 181 ms ; Avg. throughput over a period of 7000s = 202 Mb/s Second stream GVACHI : RTT = 117 ms; Avg. throughput over a period of 7000s = 514 Mb/s Links utilization 71,6%
Grid DT tuning in order to improve fairness between two TCP streams with different RTT:First stream GVA Sunnyvale : RTT = 181 ms, Additive increment = A = 7 ; Average throughput = 330 Mb/sSecond stream GVACHI : RTT = 117 ms, Additive increment = B = 3 ; Average throughput = 388 Mb/sLinks utilization 71.8%
Throughput of two streams with different RTT sharing a 1Gbps bottleneck
0
100
200
300
400
500
600
700
800
900
1000
0 1000 2000 3000 4000 5000 6000Time (s)
Thro
ughp
ut (M
bps)
A=7 ; RTT=181ms
Average over the life of the connection RTT=181msB=3 ; RTT=117ms
Average over the life of the connection RTT=117ms
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 29 / 66
-
Conservative Extensions to TCP Scalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP
Scalable TCP
proposed by Tom Kelly [1]
congestion control is not AIMD any more:
cwnd = cwnd + 0, 01 cwnd. . . per RTT without packet losscwnd = cwnd + 0, 01. . . per ACKcwnd = 0, 875 cwnd. . . per packet loss
=⇒ Multiplicative Increase Multiplicative Decrease (MIMD)for smaller window size and/or higher loss rate in the network theScalable-TCP switches into AIMD mode
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 30 / 66
-
Conservative Extensions to TCP Scalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP
Scalable TCP
PSaAP II2004-03-19
26/52
Scalable TCP (2)
Figure: Packet loss recovery times for the traditional TCP (left) are proportional tocwnd and RTT. A Scalable TCP connection (right) has packet loss recovery times thatare proportional to connection’s RTT only. (Note: link capacity c < C)
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 31 / 66
-
Conservative Extensions to TCP Scalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP
Scalable TCP – fairness I.
Two concurrent Scalable TCP streams, Scalable control switched on when>30Mb/s, twiced number of steps in comparison with previous simulations
00
20
20
40
40
60
60
80
80
100
100
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 32 / 66
-
Conservative Extensions to TCP Scalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP
Scalable TCP – fairness II.
Scalable TCP and traditional TCP streams, Scalable control switched onwhen >30Mb/s, twiced number of steps
00
20
20
40
40
60
60
80
80
100
100
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 33 / 66
-
Conservative Extensions to TCP Scalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP
Scalable TCP – Response curve
PSaAP II2004-03-19
28/52
Scalable TCP - response curve
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 34 / 66
-
Conservative Extensions to TCP Scalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP
High-Speed TCP (HSTCP)
Sally Floyd, RFC3649, [2]
congestion control AIMD/MIMD:
cwnd = cwnd + a(cwnd). . . per RTT without loss
cwnd = cwnd + a(cwnd)cwnd. . . per ACKcwnd = b(cwnd) cwnd. . . per packet loss
emulates the behavior of traditional TCP for small window sizesand/or higher packet loss rates in the network
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 35 / 66
-
Conservative Extensions to TCP Scalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP
High-Speed TCP (HSTCP)
proposed MIMD parametrization:
b(cwnd) =−0, 4(log(cwnd)− 3, 64)
7, 69+ 0, 5
a(cwnd) =2cwnd2b(cwnd)
12, 8(2− b(cwnd))w 1,2
200 400 600 800 1000 1200 1400 1600 1800 2000
time [RTT]
10000
20000
30000
40000
50000
60000
70000
80000
cwn
d[M
SS
]
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 36 / 66
-
Conservative Extensions to TCP Scalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP
High-Speed TCP (HSTCP)
a parametrization equivalent to the Scalable-TCP is possible:⇒ Linear HSTCPa comparison with the Multi-stream TCP
N(cwnd) ≈ 0, 23cwnd0,4
N(cwnd) – the number of parallel TCP connections emulated by theHighSpeed TCP response function with congestion window cwnd
Neither Scalable TCP nor HSTCP (sophistically) deal with the slow-startphase.
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 37 / 66
-
Conservative Extensions to TCP Scalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP
HSTCP – Response curve
PSaAP II2004-03-19
35/52
HSTCP (4)
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 38 / 66
-
Conservative Extensions to TCP Scalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP
H-TCP I.
created by researchers at the Hamilton Institute in Irelanda simple change to cwnd increase function
increases its aggressiveness (in particular, the rate of additiveincrease) as the time since the previous loss (backoff) increases
increase rate α is a function of the elapsed time since the last backoffthe AIMD mechanism is used
preserves many of the key properties of standard TCP: fairness,responsiveness, relationship to buffering
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 39 / 66
-
Conservative Extensions to TCP Scalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP
H-TCP II.
∆ . . . time elapsed from last congestion experienced
∆L . . . for ∆ ≤ ∆L a TCP’s grow is used∆B . . . the bandwidth threshold, above which the TCP fall is used(for significant bandwidth changes the 0.5 fall is used)
Tmin, Tmax . . . the minimal resp. maximal RTTs measured
B(k) . . . maximum throughput measurement for the last intervalwithout packet loss
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 40 / 66
-
Conservative Extensions to TCP Scalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP
H-TCP III.
cwnd = cwnd + 2(1−β) a(∆)cwnd. . . per ACK
cwnd = b(B) cwnd. . . per loss
a(∆) =
{1max{a′(∆)Tmin; 1}
∆ ≤ ∆L∆ > ∆L
b(B) =
{0, 5
min{ TminTmax ; 0, 8}
∣∣∣B(k+1)−B(k)B(k) ∣∣∣ > ∆Bin the other case
a′(∆) = 1 + 10(∆−∆L) + 0, 5(∆−∆L)2
. . . quadratic increment function
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 41 / 66
-
Conservative Extensions to TCP Scalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP
BIC-TCP
the default algorithm in Linux kernels (2.6.8 and above)
uses binary-search algorithm for cwnd update [3]4 phases:(1) a reaction to a packet loss(2) additive increase(3) binary search(4) maximum probing
additive increase binary search maximum probing
> Smax
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 42 / 66
-
Conservative Extensions to TCP Scalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP
BIC-TCP
(1) Packet loss
BIC-TCP starts from the TCP slow startwhen a loss is detected, it uses multiplicative decrease (as standardTCP) and sets the windows just before and after loss event as:
previous window size →Wmax (the size of cwnd before the loss)reduced window size →Wmin (the size of cwnd after the loss)
=⇒ because the loss occured when cwnd ≤Wmax , the point ofequilibrium of cwnd will be searched in the range 〈Wmin; Wmax〉
(2) Additive increase
starting the search from cwnd = Wmin+Wmax2 might be too challengingfor the networkthus, when Wmin+Wmax2 > Wmin + Smax , the additive increase takes place→ cwnd = Wmin + Smax
the window linearly increases by Smax every RTT
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 43 / 66
-
Conservative Extensions to TCP Scalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP
BIC-TCP
(3) Binary search
once the target (cwnd = Wmin+Wmax2 ) is reached, the Wmin = cwnd
otherwise (a packet loss happened) Wmax = cwnd
and the searching continues to the new target (using the additiveincrease, if necessary) until the change of cwnd is less than theSminconstant
here, cwnd = Wmax is set
The points (2) and (3) lead to linear (additive) increase, which turns intologarithmic one (binary search).
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 44 / 66
-
Conservative Extensions to TCP Scalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP
BIC-TCP
(4) Maximum probing
inverse process to points (3) and (2)first, the inverse binary search takes place (until the cwnd growth isgreater than Smax)once the cwnd growth is greater than Smax , the linear growth (by areasonably large fixed increment) takes place
first exponencial growth, then linear growth
Assumed benefits:
traditional TCP “friendliness”
during the “plateau” (3), the TCP flows are able to growAIMD behavior (even though faster) during (2) and (4) phases
more stable window size ⇒ better network utilizationmost of the time, the BIC-TCP should spend in the “plateau” (3)
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 45 / 66
-
Conservative Extensions to TCP Scalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP
CUBIC-TCP
even though being pretty good scalable, fair, and stable, BIC’s growth function is
considered to be still aggressive for TCPespecially under short RTTs or low speed networks
CUBIC-TCPa new release of BIC, which uses a cubic functionfor the purpose of simplicity in protocol analysis, the number of phases wasfurther reduced
Wcubic = C (T − K )3 + Wmaxwhere C is a scaling constant, T is the time elapsed since last loss event, Wmax is the window
size before loss event, K = 3√
WmaxβC
, and β is a constant decrease factor
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 46 / 66
-
TCP Extensions with IP Support QuickStart, E-TCP, FAST
Lecture overview
1 Traditional TCP and its issues2 Improving the traditional TCP
Multi-stream TCPWeb100
3 Conservative Extensions to TCPGridDTScalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP
4 TCP Extensions with IP SupportQuickStart, E-TCP, FAST
5 Approaches Different from TCPtsunamiRBUDPXCPSCTP, DCCP, STP, Reliable UDP, XTP
6 Conclusions7 Literature
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 47 / 66
-
TCP Extensions with IP Support QuickStart, E-TCP, FAST
Quickstart (QS)/Limited Slowstart I.
there is a strong assumption, that the slow-start phase cannot beimproved without an interaction with lower network layers
a proposal: 4-byte option in IP header, which comprises of QS TTLand Initial Rate fields
sender, which wants to use the QS, sets the QS TTL to an arbitrary(but high enough) value and the Initial Rate to requested rate, whichit wants to start the sending at, and sends the SYN packet
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 48 / 66
-
TCP Extensions with IP Support QuickStart, E-TCP, FAST
Quickstart (QS)/Limited Slowstart II.
each router on the path, which support the QS, decreases theQS TTL by one and decreases the Initial Rate, if necessary
receiver sends the QS TTL and Initial Rate in the SYN/ACK packetto the sender
sender knows, whether all the routers on the path support the QS(by comparing the QS TTL and the TTL)
sender sets the appropriate cwnd and starts using its congestioncontrol mechanism (e.g., AIMD)
Requires changes in the IP layer! :-(
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 49 / 66
-
TCP Extensions with IP Support QuickStart, E-TCP, FAST
E-TCP I.
Early Congestion Notification (ECN)
a component of Advanced Queue Management (AQM)a bit, which is set by routers when a congestion of link/buffer/queue iscomingECN flag has to be mirrored by the receiverthe TCP should react to the ECN bit being set in the same way as to apacket lossrequires the routers’ administrators to configure the AQM/ECN :-(
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 50 / 66
-
TCP Extensions with IP Support QuickStart, E-TCP, FAST
E-TCP II.
E-TCPproposes to mirror the ECN bit just once (for the first time only)freezes the cwnd when an ACK having ECN-bit set is received from thereceiverrequires introducing of small (synthetic) losses to the network in orderto perform multiplicative decrease because of fairnessrequires a change in receivers’ behavior to ECN bit :-(
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 51 / 66
-
TCP Extensions with IP Support QuickStart, E-TCP, FAST
FAST
Fast AQM Scalable TCP (FAST) [5]
uses end-to-end delay, ECN and packet losses for congestiondetection/avoidance
if too few packets are queued in the routers (detected by RTTmonitoring), the sending rate is increased
differences from the TCP Vegas:
TCP Vegas makes fixed size adjustments to the rate, independent ofhow far the current rate is from the target rateFAST TCP makes larger steps when the system is further fromequilibrium and smaller steps near equilibriumif the ECN is available in the network, FAST TCP can be extended touse ECN marking to replace/supplement queueing delay and packetloss as the congestion measure
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 52 / 66
-
Approaches Different from TCP
Lecture overview
1 Traditional TCP and its issues2 Improving the traditional TCP
Multi-stream TCPWeb100
3 Conservative Extensions to TCPGridDTScalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP
4 TCP Extensions with IP SupportQuickStart, E-TCP, FAST
5 Approaches Different from TCPtsunamiRBUDPXCPSCTP, DCCP, STP, Reliable UDP, XTP
6 Conclusions7 Literature
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 53 / 66
-
Approaches Different from TCP tsunami
tsunami
TCP connection for out-of-band control channel
connection parameters negitiationrequirements for retransmissions – uses NACKs instead of ACKsconnection termination negotiation
UDP channel for data transmission
MIMD congestion controlhighly configurable/customizable
MIMD parameters, losses threshold, maximum size of the queue forretransmissions, the interval of sending the retransmissions’ requests,etc.
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 54 / 66
-
Approaches Different from TCP RBUDP
Reliable Blast UDP – RBUDP
similar to tsunami – out-of-band TCP channel for control, UDP fordata transmission
proposed for disk-to-disk transmissions, resp. the transmissions wherethe complete transmitted data could be saved in the sender’s memory
sends data in a user-defined rate
app perf (a clon of iperf) is used for an estimation ofnetworks’/receivers’ capacity
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 55 / 66
-
Approaches Different from TCP RBUDP
Reliable Blast UDP – RBUDP
E. He, J. Leigh, O. Yu, T. A. DeFanti, Reliable Blast UDP : Predictable High Performance Bulk Data Transfer, accepted paper, IEEE Cluster Computing 2002, Chicago, Illinois, Sept, 2002.
3
In RBUDP, the most important input parameter is the sending rate of the UDP blasts. To minimize loss, the sending rate should not be larger than the bandwidth of the bottleneck link (typically a router). Tools such as Iperf [Iperf] and netperf [Netperf] are typically used to measure the bottleneck bandwidth. In theory if one could send data just below this rate, data loss should be near zero. In practice however, other factors need to be considered. In our first implementation of RBUDP, we chose a send rate of 5% less than the available network bandwidth predicted by Iperf. Surprisingly this resulted in approximately 33% loss! After further investigation we found that the problem was in the end host rather than the network. Specifically, the receiver was not fast enough to keep up with the network while moving data from the kernel buffer to application buffers. When we used a faster computer as the receiver, the loss rate decreased to less than 2%. The details of this experiment are further discussed in Section 5. The chief problem with using Iperf as a measure of possible throughput over a link is that it does not take into account the fact that in a real application, data is not simply streamed to a receiver and discarded. It has to be moved into main memory for the application to use. This has motivated us to produce app_perf (a modified version of iperf) to take into account an extra memory copy that most applications must perform. We can therefore use app_perf as a more realistic bound for how well a transmission scheme should be able to reasonably obtain. In the experiments detailed in Section 5, we however include both iperf and app_perf’s prediction of available bandwidth.
Figure 1. The Time Sequence Diagram of RBUDP
Three versions of RBUDP were developed:
1. RBUDP without scatter/gather optimization – this is a naïve implementation of RBUDP where each incoming packet is examined (to determine where it should go in the application’s memory buffer) and then moved there.
2. RBUDP with scatter/gather optimization – this implementation takes advantage of the fact that most incoming packets are likely to arrive in order, and if transmission rates are below the maximum throughput of the network, packets are unlikely to be lost. The algorithm works by using readv() to directly move the data from kernel memory to its predicted location in the application’s memory. After performing this readv() the packet header is examined to determine if it was placed in the correct location. If it was not (either because it was an out-of-order packet, or an intermediate packet was lost), then the packet is moved to the correct location in the user’s memory buffer.
3. “Fake” RBUDP – this implementation is the same as the scheme without the scatter/gather optimization except the incoming data is never moved to application memory. This was used to examine the overhead of the RBUDP protocol compared to raw transmission of UDP packets via Iperf.
Sender Receiver
…
A
B C
D
E F
G
UDP data traffic
TCP signaling traffic
Source: E. He, J. Leigh, O. Yu, T. A. DeFanti, “Reliable Blast UDP: Predictable High Performance BulkData Transfer,” IEEE Cluster Computing 2002, Chicago, Illinois, Sept, 2002.
A start of the transmission (using pre-defined rate)B end of the transmissionC sending the DONE signal via the control channel; the receiver responses with a
mask of data, that had arrivedD re-sending of missing data
E-F-G end of transmission
The steps C and D repeat until all the data are delivered.
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 56 / 66
-
Approaches Different from TCP XCP
eXplicit Control Protocol – XPC
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 57 / 66
uses a feedback from routers per paket
PSaAP II2004-03-19
44/52
XCP
• per packet feedback from routers
Feedback
Round Trip Time
Congestion Window
Congestion Header
Feedback
Round Trip Time
Congestion Window
Feedback =+ 0.1 packet
-
Approaches Different from TCP XCP
eXplicit Control Protocol – XPC
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 57 / 66
uses a feedback from routers per paket
PSaAP II2004-03-19
45/52
XCP (2)
Feedback =+ 0.1 packet
Round Trip Time
Congestion Window
Feedback =- 0.3 packet
-
Approaches Different from TCP XCP
eXplicit Control Protocol – XPC
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 57 / 66
uses a feedback from routers per paket
PSaAP II2004-03-19
46/52
XCP (3)
Congestion Window = Congestion Window + Feedback
-
Approaches Different from TCP SCTP, DCCP, STP, Reliable UDP, XTP
Different approaches I.
SCTP
multi-stream, multi-homed transport (end node might have several IPaddresses)message-oriented like UDP, ensures reliable, in-sequence transport ofmessages with congestion control like TCPhttp://www.sctp.org/
DCCP
non-reliable protocol (UDP) with a congestion control compatible withthe TCPhttp://www.ietf.org/html.charters/dccp-charter.html
http://www.icir.org/kohler/dcp/
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 58 / 66
http://www.sctp.org/http://www.ietf.org/html.charters/dccp-charter.htmlhttp://www.icir.org/kohler/dcp/
-
Approaches Different from TCP SCTP, DCCP, STP, Reliable UDP, XTP
Different approaches II.
STP
based on CTS/RTSa simple protocol designed for a simple implementation in HWwithout any sophisticated congestion control mechanismhttp://lwn.net/2001/features/OLS/pdf/pdf/stlinux.pdf
Reliable UDP
ensures reliable and in-order delivery (up to the maximum number ofretransmissions)RFC908 a RFC1151originally proposed for IP telephonyconnection parameters can be set per-connectionhttp://www.javvin.com/protocolRUDP.html
XTP (Xpress Transfer Protocol), . . .
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 59 / 66
http://lwn.net/2001/features/OLS/pdf/pdf/stlinux.pdfhttp://www.javvin.com/protocolRUDP.html
-
Conclusions
Lecture overview
1 Traditional TCP and its issues2 Improving the traditional TCP
Multi-stream TCPWeb100
3 Conservative Extensions to TCPGridDTScalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP
4 TCP Extensions with IP SupportQuickStart, E-TCP, FAST
5 Approaches Different from TCPtsunamiRBUDPXCPSCTP, DCCP, STP, Reliable UDP, XTP
6 Conclusions7 Literature
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 60 / 66
-
Conclusions
Conclusions I.
Current state:
multi-stream TCP is intensively used (e.g., Grid applications)looking for a way which will allow safe (i.e., backward compatible)development/deployment of post-TCP protocolsaggressive protocols are used on private/dedicated networks/circuits(e.g., λ-networks CzechLight/CESNET2, SurfNet, CaNET*4, . . . )implementation SCTP under FreeBSD 7.0implementation DCCP under Linux
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 61 / 66
-
Conclusions
Conclusions II.
interaction with L3 (IP)
interaction with data link layer
variable delay and throughput in wireless networksoptical burst switching
specific per-flow states in routers:
e.g., per-flow setting for packet loss generation (→ E-TCP)may help short-term flows with high capacity demands (macro-bursts)problem with scalability and cost :-(
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 62 / 66
-
Literature
Lecture overview
1 Traditional TCP and its issues2 Improving the traditional TCP
Multi-stream TCPWeb100
3 Conservative Extensions to TCPGridDTScalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP
4 TCP Extensions with IP SupportQuickStart, E-TCP, FAST
5 Approaches Different from TCPtsunamiRBUDPXCPSCTP, DCCP, STP, Reliable UDP, XTP
6 Conclusions7 Literature
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 63 / 66
-
Literature
Literature
Jacobson V. “Congestion Avoidance and Control”, Proceedings of ACM SIGCOMM’88(Standford, CA, Aug. 1988), pp. 314–329.ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z
Allman M., Paxson V., Stevens W. “TCP Congestion Control”, RFC2581, Apr. 1999.http://www.rfc-editor.org/rfc/rfc2581.txt
Brakmo L., Peterson L. “TCP Vegas: End to End Congestion Avoidance on a GlobalInternet”, IEEE Journal of Selected Areas in Communication, Vol. 13, No. 8, pp.1465–1480, Oct. 1995. ftp://ftp.cs.arizona.edu/xkernel/Papers/jsac.ps
http://www.web100.org
Hacker T. J., Athey B. D., Sommerfield J. “Experiences Using Web100 for End-To-EndNetwork Performance Tuning”http://www.web100.org/docs/ExperiencesUsingWeb100forHostTuning.pdf
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 64 / 66
ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Zhttp://www.rfc-editor.org/rfc/rfc2581.txtftp://ftp.cs.arizona.edu/xkernel/Papers/jsac.pshttp://www.web100.orghttp://www.web100.org/docs/ExperiencesUsingWeb100forHostTuning.pdf
-
Literature
Literature
Kelly T. “Scalable TCP: Improving Performance in Highspeed Wide Area Networks”,PFLDnet 2003,http://datatag.web.cern.ch/datatag/pfldnet2003/papers/kelly.pdf,http://wwwlce.eng.cam.ac.uk/~ctk21/scalable/
Floyd S. “HighSpeed TCP for Large Congestion Windows”, 2003,http://www.potaroo.net/ietf/all-ids/draft-floyd-tcp-highspeed-03.txt
BIC-TCP, http://www.csc.ncsu.edu/faculty/rhee/export/bitcp/
Floyd S., Allman M., Jain A., Sarolahti P. “Quick-Start for TCP and IP”, 2006,http://www.ietf.org/internet-drafts/draft-ietf-tsvwg-quickstart-02.txt
Jin C., Wei D., Low S. H., Buhrmaster G., Bunn J., Choe D. H., Cottrell R. L. A., Doyle J.C., Newman H., Paganini F., Ravot S., Singh S. “FAST – Fast AQM Scalable TCP.”http://netlab.caltech.edu/FAST/
http://netlab.caltech.edu/pub/papers/FAST-infocom2004.pdf
tsunami, http://www.anml.iu.edu/anmlresearch.html
Zdroj: E. He, J. Leigh, O. Yu, T. A. DeFanti, “Reliable Blast UDP: Predictable HighPerformance Bulk Data Transfer,” IEEE Cluster Computing 2002, Chicago, Illinois, Sept,2002.
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 65 / 66
http://datatag.web.cern.ch/datatag/pfldnet2003/papers/kelly.pdfhttp://wwwlce.eng.cam.ac.uk/~ctk21/scalable/http://www.potaroo.net/ietf/all-ids/draft-floyd-tcp-highspeed-03.txthttp://www.csc.ncsu.edu/faculty/rhee/export/bitcp/http://www.ietf.org/internet-drafts/draft-ietf-tsvwg-quickstart-02.txthttp://netlab.caltech.edu/FAST/http://netlab.caltech.edu/pub/papers/FAST-infocom2004.pdfhttp://www.anml.iu.edu/anmlresearch.html
-
Literature
Further materials
Workshops PFLDnet 2003–2010
http:
//datatag.web.cern.ch/datatag/pfldnet2003/program.html
http://www-didc.lbl.gov/PFLDnet2004/
http://www.ens-lyon.fr/LIP/RESO/pfldnet2005/
http://www.hpcc.jp/pfldnet2006/
http://wil.cs.caltech.edu/pfldnet2007/
prof. Sally Floyd’s pages:
http://www.icir.org/floyd/papers.html
RFC3426 – “General Architectural and Policy Considerations”
http://www.hamilton.ie/net/eval/results_HI2005.pdf
Eva Hladká (FI MU) 4. Behind Traditional TCP . Autumn 2014 66 / 66
http://datatag.web.cern.ch/datatag/pfldnet2003/program.htmlhttp://datatag.web.cern.ch/datatag/pfldnet2003/program.htmlhttp://www-didc.lbl.gov/PFLDnet2004/http://www.ens-lyon.fr/LIP/RESO/pfldnet2005/http://www.hpcc.jp/pfldnet2006/http://wil.cs.caltech.edu/pfldnet2007/http://www.icir.org/floyd/papers.htmlhttp://www.hamilton.ie/net/eval/results_HI2005.pdf
Traditional TCP and its issuesImproving the traditional TCPMulti-stream TCPWeb100
Conservative Extensions to TCPGridDTScalable TCP, High-Speed TCP, H-TCP, BIC-TCP, CUBIC-TCP
TCP Extensions with IP SupportQuickStart, E-TCP, FAST
Approaches Different from TCPtsunamiRBUDPXCPSCTP, DCCP, STP, Reliable UDP, XTP
ConclusionsLiterature