1 EE384Y: Packet Switch Architectures Part II Sizing Router Buffers (Recent work by Guido...

High PerformanceSwitching and RoutingTelecom Center Workshop: Sept 4, 1997.

EE384Y: Packet Switch ArchitecturesPart II

Sizing Router Buffers

(Recent work by Guido Appenzeller)

Nick McKeownProfessor of Electrical Engineering and Computer Science, Stanford University

nickm@stanford.eduhttp://www.stanford.edu/~nickm

How much Buffer does a Router need?

Universally applied rule-of-thumb: A router needs a buffer size:

• 2T is the round-trip propagation time (or just 250ms)• C is the capacity of the outgoing link

Background Mandated in backbone and edge routers. Appears in RFPs and IETF architectural guidelines. Has major consequences for router design. Comes from dynamics of TCP congestion control. Villamizar and Song: “High Performance TCP in ANSNET”,

CCR, 1994. Based on 2 to 16 TCP flows at speeds of up to 40 Mb/s.

Example

10Gb/s linecard or router Requires 300Mbytes of buffering. Read and write new packet every 32ns.

Memory technologies SRAM: require 80 devices, 1kW, $2000. DRAM: require 4 devices, but too slow.

Problem gets harder at 40Gb/s Hence RLDRAM, FCRAM, etc.

TCP adapts to congestion Sender sends packets, receiver sends ACKs Sending rate is controlled by Window W At any time, only W unacknowledged packets may be

outstanding

W is adjusted for each packet (in CA mode): If ACK received: W = W+1/W (W=W+1 for each W

packets) If packet is lost: W = W/2 (W halved in case of loss)

The sending rate of TCP is: RTT

Single TCP FlowRouter with large enough buffers for full link utilization

DestCC’ > C

Source

Window size Buffer size and RTT

For every W ACKs received, send W+1 packets

Over-buffered Link

Under-buffered Link

Buffer = Rule-of-thumb

Interval magnifiedon next slide

Microscopic TCP BehaviorWhen sender pauses, buffer drains

one RTTDrop

Origin of rule-of-thumb Before and after reducing window size, the sending rate of the

TCP sender is the same

Inserting the rate equation we get

The RTT is part transmission delay T and part queuing delay B/C . We know that after reducing the window, the queueing delay is zero.

newold RR

W oldold

Rule-of-thumb

Rule-of-thumb makes sense for one flow Typical backbone link has > 20,000 flows Does the rule-of-thumb still hold?

Answer: If flows are perfectly synchronized, then Yes. If flows are desynchronized then No.

Buffer size is height of sawtooth

If flows are synchronized

Aggregate window has same dynamics Therefore buffer occupancy has same dynamics Rule-of-thumb still holds.

Two TCP FlowsTwo TCP flows can synchronize

If flows are not synchronized

Aggregate window has less variation Therefore buffer occupancy has less variation The more flows, the smaller the variation Rule-of-thumb does not hold.

)( WMin

)( WMax

If flows are not synchronized

ProbabilityDistributionBuffer Size

Quantitative Model Model congestion window of a flow as random variable

)(tWi model as )(][ xfxWP i iW where

For many de-synchronized flows We assume congestions windows are independent All congestion windows have the same probability distribution

2]var[][ WiWi WWE

Now central limit theorem gives us queue length distribution

)1,0()( NnntW WWn

Required buffer size

Simulation

Required buffer size

99.5%2T C

Small buffers help short flowsAverage flow completion times of 14 packet flows that share a congested bottleneck link with long-lived flows.

Experiments with backbone routerGSR 12000, OC3 Line Card

Router Buffer Link Utilization

Pkts RAM Model Sim Exp

100 0.5 x

400 0.5 x

Thanks: Experiments conducted by Paul Barford and Joel Sommers, U of Wisconsin

What about Short Flows?

So far we assumed long flows in congestion avoidance mode. What if traffic is mainly short flows in slow-start?

Answer: Behavior is different, but In mixes of flows, long flows drive buffer requirements Required buffer for short flows is independent of line

speed and RTT (same for 1Mbit/s or 40 Gbit/s)

A single, short-lived TCP flowFlow length 62 packets, RTT ~140 ms

RTTsynfin ack

received

Flow Completion Time (FCT)

Modelling TCPFlows vs. independent bursts

Inter-Burst Arrival Time is greater than buffer sizeTherefore, we assume bursts are independent.

Poisson arrivals of flows

Arrivals of length Lflow (the

flow length in packets)

Poisson arrivals of bursts

Four different poisson arrival processes of lengths 2,4,...

S i Lflow

CLflow

S i 2,4, 8, 16

CLflow E S i

The M/G/1 ModelTCP traffic is modelled as an M/G/1 arrival

process: poisson arrivals of jobs

with an arrival rate of

S i 2,4, 8, 16...

burst E Si

is the load

Average queue length in jobs is:

2E S 2

2 1 E S 2

This gives us an average queue length in packets of

E Q E NQ E S2E S 2

2 1 E S

Let's see if this works in practice...

Average Queue length

capacity :C 40Mbit sload : 0.8

for length 50packets :Lflow 400MbitAverage100flows secondCompletion time 400ms

Queue Distribution To determine the required buffer, we need the queue

distribution.

Or at least the tail endof the queue distribution

P(Q = x)

Buffer B

Packet Loss

● For M/G/1 queues there is no general solution for the queue distribution.

● We did two things (details are in the paper):

– Use M/G/1 processor sharing model (bad)– Use Frank Kelly's effective bandwidth (good)

In Summary

Buffer size is dictated by long TCP flows. 10Gb/s linecard with 200,000 x 56kb/s flows

Rule-of-thumb: Buffer = 2.5Gbits• Requires external, slow DRAM

Becomes: Buffer = 6Mbits• Can use on-chip, fast SRAM• Completion time halved for short-flows

40Gb/s linecard with 40,000 x 1Mb/s flows Rule-of-thumb: Buffer = 10Gbits Becomes: Buffer = 50Mbits

1 EE384Y: Packet Switch Architectures Part II Sizing Router Buffers (Recent work by Guido...

Documents

Transcript of 1 EE384Y: Packet Switch Architectures Part II Sizing Router Buffers (Recent work by Guido...

Sizing Router Buffers Isaac Keslassy (Technion) Guido Appenzeller & Nick McKeown (Stanford)

BUILDING FOR LEGACY. The London Experience (Markus Appenzeller)

Les McKeown- Predictable Success (Unplugged)

EE384y Heuristics

Risk Management in Sport · Risk Management in Sport Issues and Strategies Third Edition Edited by Herb Appenzeller Carolina Academic Press Durham, North Carolina appenzeller 3e RMS

1 EE384Y: Packet Switch Architectures Part II Address Lookup and Classification Nick McKeown Professor of Electrical Engineering and Computer Science,

Foskett v Mckeown

Nick McKeown

EE384Y: Packet Switch Architectures Part II Scaling Crossbar Switches

EE384y 2004 1 EE384Y: Packet Switch Architectures Part II Scaling Crossbar Switches Nick McKeown Professor of Electrical Engineering and Computer Science,

PATRICK MCKEOWN - oxygenadvantage.com

Physicochemical and Sensory Properties of Appenzeller ...

McKeown Rotation Talk

Can the Production Network Be the Testbed? Rob Sherwood Deutsche Telekom Inc. R&D Lab Glen Gibb, KK Yap, Guido Appenzeller, Martin Cassado, Nick McKeown,

EE384y: Packet Switch Architectures Matchings, implementation and heuristics

Robotic-assisted McKeown esophagectomy

Copyright © 2013 Herb and Tom Appenzeller Introduction to Sport Management.

Mary M. McKeown, 2011

OpenFlowSwitch.org OpenFlow Guru Parulkar parulkar@stanford.edu Stanford OpenFlow team: Nick McKeown, Guido Appenzeller, Glen Gibb, David Underhill, David.

Colgrove+ +the+McKeown+Thesis