New designs for Internet congestion control Damon Wischik (UCL) .

24
New designs for Internet congestion control Damon Wischik (UCL) http://www.cs.ucl.ac.uk/staff/D.Wischik

Transcript of New designs for Internet congestion control Damon Wischik (UCL) .

Page 1: New designs for Internet congestion control Damon Wischik (UCL) .

New designs for Internet congestion control

Damon Wischik (UCL)http://www.cs.ucl.ac.uk/staff/D.Wischik

Page 2: New designs for Internet congestion control Damon Wischik (UCL) .

Some Internet History

• 1974: First draft of TCP/IP“A protocol for packet network interconnection”, Vint Cerf and Robert Kahn

• 1983: ARPANET switches on TCP/IP

• 1986: Congestion collapse• 1988: Congestion control for TCP

“Congestion avoidance and control”, Van Jacobson

“A Brief History of the Internet”, the Internet Society

Page 3: New designs for Internet congestion control Damon Wischik (UCL) .

Sizing router buffers SIGCOMM 2004

Abstract. All Internet routers contain buffers to hold packets during times of congestion. Today, the size of the buffers is determined by the dynamics of TCP's congestion control algorithm. In particular, the goal is to make sure that when a link is congested, it is busy 100% of the time; which is equivalent to making sure its buffer never goes empty. A widely used rule-of-thumb states that each link needs a buffer of size B = RTT*C, where RTT is the average round-trip time of a flow passing across the link, and C is the data rate of the link. For example, a 10Gb/s router linecard needs approximately 250ms*10Gb/s = 2.5Gbits of buffers; and the amount of buffering grows linearly with the line-rate. Such large buffers are challenging for router manufacturers, who must use large, slow, off-chip DRAMs. And queueing delays can be long, have high variance, and may destabilize the congestion control algorithms. In this paper we argue that the rule-of-thumb (B = RTT*C) is now outdated and incorrect for backbone routers. This is because of the large number of flows (TCP connections) multiplexed together on a single backbone link. Using theory, simulation and experiments on a network of real routers, we show that a link with N flows requires no more than B = (RTT*C)/N, for long-lived or short-lived TCP flows. The consequences on router design are enormous: A 2.5Gb/s link carrying 10,000 flows could reduce its buffers by 99% with negligible difference in throughput; and a 10Gb/s link carrying 50,000 flows requires only 10Mbits of buffering, which can easily be implemented using fast, on-chip SRAM.

Guido Appenzeller Isaac Keslassy Nick McKeownStanford University Stanford University Stanford University

http://tiny-tera.stanford.edu/~nickm/papers/index.html

Page 4: New designs for Internet congestion control Damon Wischik (UCL) .

TCPif (seqno > _last_acked) {

if (!_in_fast_recovery) {

_last_acked = seqno;

_dupacks = 0;

inflate_window();

send_packets(now);

_last_sent_time = now;

return;

}

if (seqno < _recover) {

uint32_t new_data = seqno - _last_acked;

_last_acked = seqno;

if (new_data < _cwnd) _cwnd -= new_data; else _cwnd=0;

_cwnd += _mss;

retransmit_packet(now);

send_packets(now);

return;

}

uint32_t flightsize = _highest_sent - seqno;

_cwnd = min(_ssthresh, flightsize + _mss);

_last_acked = seqno;

_dupacks = 0;

_in_fast_recovery = false;

send_packets(now);

return;

}

if (_in_fast_recovery) {

_cwnd += _mss;

send_packets(now);

return;

}

_dupacks++;

if (_dupacks!=3) {

send_packets(now);

return;

}

_ssthresh = max(_cwnd/2, (uint32_t)(2 * _mss));

retransmit_packet(now);

_cwnd = _ssthresh + 3 * _mss;

_in_fast_recovery = true;

_recover = _highest_sent;

}

time [0-8 sec]

band

wid

th [

0-10

0 kB

/sec

]

Page 5: New designs for Internet congestion control Damon Wischik (UCL) .

How TCP shares capacity

sum of flowbandwidths

time

availablebandwidth

individualflow

bandwidths

Page 6: New designs for Internet congestion control Damon Wischik (UCL) .

Macroscopic description of TCP• Let x be the mean bandwidth of a flow [pkts/sec]

Let RTT be the flow’s round-trip time [sec]

Let p be the packet loss probability• The TCP algorithm increases x at rate 1/RTT2 [pkts/sec]

and reduces x by x/2 for every packet loss• average increase in rate = average decrease in rate:

1/RTT2 = (p x) x/2

Page 7: New designs for Internet congestion control Damon Wischik (UCL) .

Macroscopic description• Let x be the mean bandwidth of a flow [pkts/sec]

Let RTT be the flow’s round-trip time [sec]

Let p be the packet loss probability• The TCP algorithm increases x at rate 1/RTT2 [pkts/sec]

and reduces x by x/2 for every packet loss• average increase in rate = average decrease in rate:

1/RTT2 = (p x) x/2

• Consider a link with N identical flowsLet NC be the capacity of the link [pkts/sec]

• packet loss ratio = fraction of work that exceeds service rate:p = (Nx-NC)+/Nx = (x-C)+/x

Page 8: New designs for Internet congestion control Damon Wischik (UCL) .

Fixed-point analysis

0.5 1 1.5 2

-4

-3

-2

-1

C*RTT=4 pkts

C*RTT=20 pkts

C*RTT=100 pkts

traffic intensity x/C

log10 ofpkt loss

probability

Page 9: New designs for Internet congestion control Damon Wischik (UCL) .

Teleological description

• Consider several TCP flows sharing a single link

• Let xr be the mean bandwidth of flow r [pkts/sec]

Let y be the total bandwidth of all flows [pkts/sec]

Let C be the total available capacity [pkts/sec]

• TCP and the network act so as to solvemaximise r U(xr) - P(y,C) over xr0 where y=r

xr

x

U(x

)

y

P(y

,C)

C

Page 10: New designs for Internet congestion control Damon Wischik (UCL) .

Rate control in communication networks: shadow prices, proportional fairness and stability Journal of the Operational Research Society, 1998

F.P.Kelly, A.K.Maulloo, D.K.H.TanStatistical Laboratory, Cambridge

Abstract. This paper analyses the stability and fairness of two classes of rate control algorithm for communication networks. The algorithms provide natural generalizations to large-scale networks of simple additive increase/multiplicative decrease schemes, and are shown to be stable about a system optimum characterized by a proportional fairness criterion. Stability is established by showing that, with an appropriate formulation of the overall optimization problem, the network's implicit objective function provides a Lyapunov function for the dynamical system defined by the rate control algorithm. The network's optimization problem may be cast in primal or dual form: this leads naturally to two classes of algorithm, which may be interpreted in terms of either congestion indication feedback signals or explicit rates based on shadow prices. Both classes of algorithm may be generalized to include routing control, and provide natural implementations of proportionally fair pricing.

http://www.statslab.cam.ac.uk/~frank/rate.html

Page 11: New designs for Internet congestion control Damon Wischik (UCL) .

Teleological description

x

little extra valued attached to high-bandwidth flowssevere penalty for

allocating too little bandwidth

x

U(x

)

Page 12: New designs for Internet congestion control Damon Wischik (UCL) .

Teleological description

x

U(x

)

flows with largeRTT are satisfied with little bandwidth

flows with small RTT want more bandwidth

Page 13: New designs for Internet congestion control Damon Wischik (UCL) .

Teleological description

y

P(y

,C)

C

no penalty unlesslinks are overloaded

Page 14: New designs for Internet congestion control Damon Wischik (UCL) .

Teleological description

• Is this what we want the Internet to optimize?• Does it make good use of the network?• Can it deliver high bandwidth and good quality?• Is it a fair allocation?• Can we design a better allocation?

x

U(x

)

yC

Page 15: New designs for Internet congestion control Damon Wischik (UCL) .

Teleology & dynamics

• The network acts to solve an optimization problem.– We can choose which optimization problem,

by changing the router & TCP’s code.

• But the network may or may not attain the solution!– To understand this, we need a

dynamical description of TCP

x

U(x

)

yC

Page 16: New designs for Internet congestion control Damon Wischik (UCL) .

Dynamical description• Consider a link with N flows

and capacity NC and buffer N1/2B• Let xt be the average bandwidth at time t

Let pt be the packet loss probability at time t• As N we believe a mean-field limit holds.

Page 17: New designs for Internet congestion control Damon Wischik (UCL) .

Dynamical description

• Fluid-based Analysis of a Network of AQM Routers Supporting TCP Flows with an Application to RED SIGCOMM 2000

Vishal Misra, Wei-Bo Gong, Don Towsley

• Rate-based versus queue-based models of congestion control ACM Sigmetrics 2004

Supratim Deb, R. Srikant

• Mean field convergence of a rate model of multiple TCP connections through a buffer implementing RED To appear in Annals of Applied Probability

David McDonald, Julien Reynier

Page 18: New designs for Internet congestion control Damon Wischik (UCL) .

Dynamical stability/instability

• For some values of C and RTT, the dynamical system is stable

• For others it is unstable and there are oscillations(i.e. the flows are partially synchronized)G.Raina and W. (2005)

20 40 60 80 1000.60.8

1.21.4

20 40 60 80 1000.60.8

1.21.4

time

arrivalrate x/C

Page 19: New designs for Internet congestion control Damon Wischik (UCL) .

Illustration of instability

Standard TCP, single bottleneck link, no AQM,service C=60 pkts/sec/flow, buffer B=170pkts,RTT=200 ms, #flows N=200

queue size[0-170pkts]

flow bandwidths[0-35pkts/RTT]

time [80-90sec]

Page 20: New designs for Internet congestion control Damon Wischik (UCL) .

Instability plot

0.5 1 1.5 2

-4

-3

-2

-1

C*RTT=4 pkts

C*RTT=20 pkts

C*RTT=100 pkts

traffic intensity x/C

log10 ofpkt loss

probability

Page 21: New designs for Internet congestion control Damon Wischik (UCL) .

Alternative buffer-sizing rules

b25 b100 b400

0.5 1 1.5

-4

-3

-2

-1

b10 b20 b50

0.5 1 1.5

-4

-3

-2

-1

0.5 1 1.5 2

-4

-3

-2

-1

b50 b1000

0.5 1 1.5-6

-5

-4

-3

-2

-1

r

p

Rule-of-thumb buffer size buffer = bandwidth*delay

Rule-of-thumb buffer size, with REDbuffer=bandwidth*delay, drop packets selectively before the buffer fills

Small buffersbuffer=50 pkts

Small buffers, ScalableTCPbuffer=50 pkts, revised rate-increase rule

Page 22: New designs for Internet congestion control Damon Wischik (UCL) .

Scalable TCP: improving performance in highspeed wide area networks SIGCOMM CCR 2003

Tom KellyCERN -- IT division

Abstract. TCP congestion control can perform badly in highspeed wide area networks because of its slow response with large congestion windows. The challenge for any alternative protocol is to better utilize networks with high bandwidth-delay products in a simple and robust manner without interacting badly with existing traffic. Scalable TCP is a simple sender-side alteration to the TCP congestion window update algorithm. It offers a robust mechanism to improve performance in highspeed wide area networks using traditional TCP receivers. Scalable TCP is designed to be incrementally deployable and behaves identically to traditional TCP stacks when small windows are sufficient. The performance of the scheme is evaluated through experimental results gathered using a Scalable TCP implementation for the Linux operating system and a gigabit transatlantic network. The preliminary results gathered suggest that the deployment of Scalable TCP would have negligible impact on existing network traffic at the same time as improving bulk transfer performance in highspeed wide area networks.

http://www-lce.eng.cam.ac.uk/~ctk21/scalable/

Page 23: New designs for Internet congestion control Damon Wischik (UCL) .

Teleological description

x

U(x

)

y

P(y

,C)

C

ScalableTCP gives more weight to high-bandwidth flows

With small buffers,the network likes to run with slightly lower utilization

Page 24: New designs for Internet congestion control Damon Wischik (UCL) .

Conclusion

• The network acts to solve an optimization problem.– We can choose which optimization problem,

by choosing the right buffer size & changing TCP’s code.

• It might not attain the solution– In order to make sure the network is stable,

we need to choose the buffer size & TCP code carefully.

• PROPOSAL– Buffers of size 20 packets in core routers

keep utilization below 90%; eliminate delay and jitter

– ScalableTCPable to deliver higher bandwidth than TCP