Post on 02-Jan-2016
Congestion ControlChapter 6
Outline
Resource Allocation Issues
Queuing Disciplines
FCFS (FIFO queues)
Priority Queuing
Fair Queuing (for flows)
TCP Congestion Control
Detection – Resolution approach (AIMD and Slow Start)
Alternatives: Fast Transmit / Fast Recovery
Congestion Avoidance
router-centric: DECbit and RED Gateways
host-centric: TCP Vegas
QoS
Congestion Control
ISSUES:
• How to fairly allocate resources (link bandwidths and switch buffers) among users.
• Two sides of the same coin:– Resource allocation so as to avoid congestion (difficult with any precision)– Congestion control if (and when) it occurs
• Resource allocation and congestion control involve both:– hosts at the edges of the network (transport protocols)– routers inside the network (queuing disciplines)
• Underlying service model can be– best-effort (assume here – end-hosts
given no opportunity for QoS demands)– multiple qualities of service QoS (later)
Destination1.5-Mbps T1 link
Router
Source2
Source1
100-Mbps FDDI
10-Mbps Ethernet
Congestion in a packet-switched network
Framework• Connectionless flows assumed: What are they? Even tho datagrams from a source to a dest are switched independently, they
typically flow thru the same path.– Routers maintain soft state info
• Somewhere between the hardstate info of a VC switch (bandwidth,
cell-loss ratio, etc) and no state info of pure connectionless.• Correct operation does not depend on
soft state info but is improved by it.– Implicitly defined: router watches
for what appears to be a flow – used in TCP Congestion Control.– Explicitly defined: source sends flow-setup (flow about to start) across network.
(a step down from a VC since explicit flow has no reliable, ordered delivery)
• Taxonomy of Resource Allocation/Congestion Control mechanisms– Router-centric: address prob inside net (decide forwards/drops, inform hosts) versus Host-centric: address problem from outside
the network)– Reservation-based: hosts request capacity when flow is established; versus Feedback-based: Explicit (e.g., congested
router sends “slow-down message) Implicit (eg, host adjust rate based on, eg, cell-loss rate)
– Window-based (telling sender remaining buffer space – as in flow control) versusRate-based (telling sender the rate at which data can be absorbed)
Router
Source2
Source1
Source3
Router
Router
Destination2
Destination1
Multiple flows passing thru a set of routers
Evaluation Criteria(of resource allocation effectiveness & fairness)
• Effective Resource Allocation
(utilization issue – network-wide point of view)
measured by Power = ratio of thruput to delay.
• Fair Resource Allocation (to individual senders)– Can assume Fair means Equal shares
– E.g., Raj Jain proposed metric when Fair means Equal and all paths are equal length:
Jain’s Fairness Index: Given flow thruputs (units/sec) x1, x2, …, xn
f(x1, x2, …, xn) = ( n
i=1 xi )2 / ( n
n
i=1 xi
2 )
If all n flows have thruput of 1 unit/sec, f = n2 / n*n = 1.
However if k have thruput 1 and n-k have thruput 0, f = k2 / n*k = k/n (less fair)
Optimalload Load
Th
rou
ghp
ut/d
elay
Pow
er
Thrashing orcongestion collapse
Queuing Discipline (Each router specifies a queuing discipline
regardless of resource allocation mechanism. Algorithm can be thought of as allocating bandwidth (which packets get transmitted) and buffer space (which packets get
dropped))
• First-In-First-Out or FIFO (AKA: FCFS)– Packets transmitted in arrival order.– No discrimination between traffic sources.– Usually used with “tail drop” policy.– FIFO + tail-drop = bundle.– Widely used in Internet.– Variations include priority queuing.
• Fair Queuing (FQ) for Flows– explicitly segregates traffic based on flows
(separate queue per flow)
• Weighted Fair Queuing allows a
weight to be assigned to each flow.
Flow 1
Flow 2
Flow 3
Flow 4
Round-robinservice
Fair Queuing - FQ AlgorithmFor simplicity, suppose clock ticks each time bit is transmitted (bit = tic)
Let Pi = length of packet i
Si = time when transmission of packet i starts
Fi = time when transmission of packet i finishes
Fi = Si + Pi
For a single flow, when does a router start transmitting packet i? if it’s before router is finished with this flow’s packet i-1, right after last bit of i-1 (Fi-1)
if no current packets for this flow, then start transmitting when 1 arrives (at time Ai)
Thus: MAX (Fi - 1, Ai) and Fi = MAX (Fi - 1, Ai) + Pi
For multiple flows (Not perfect: can’t preempt current packet)
calculate Fi for each packet that arrives on each flow (treat as timestamps)packet with lowest timestamp is next.
Flow 1 Flow 2
(a) (b)
Output Output
F = 8 F = 10F = 5
F = 10
F = 2
Flow 1(arriving)
Flow 2(transmitting)
Queue discipline: Shortest packet first Longer packet already in progress is completed first
TCP Congestion Control• Idea
– assumes best-effort network (FIFO or FQ routers) each source determines network capacity for itself
– uses implicit feedback (host adjusts rate based on its knowledge)– ACKs pace transmission (self-clocking) (I.e., only allow n
outstanding un-Ack’ed packets.
• Challenge– determining the available capacity in the first place– adjusting to changes in the available capacity
• AIMD and Slow Start were the original solutions for TCP
Additive Increase/Multiplicative Decrease (AIMD)Objective: adjust to changes in the available capacity• New state variable per connection: CongestionWindow
– set by source to limit number of packets in transit• Recall, FlowCtrl AdvertisedWindow = # of packets destination can still buffer)
MaxWin = MIN( CongestionWindow, AdvertisedWindow )EffWin = MaxWin - ( LastByteSent - LastByteAcked )
# of outstanding packets• Idea:
– increase CongestionWindow when congestion goes down– decrease CongestionWindow when congestion goes up
• Question: how does the source determine whether or not the network is congested?
• Answer: a packet timeout occurs (I.e., an Ack is late)– Assumes timeout signals that a packet was dropped due to congestion
(packet loss is so seldom due to transmission error)– lost packet implies congestion
AIMD (cont)
• In practice however, TCP increments a little for each ACK, using:
Increment = MSS * (MSS/CongestionWindow)
CongestionWindow += Increment
Trace: CongestionWindow sawtooth behavior with AIMD
AIMD works well when
source is operating close
to the available capacity of
the network. But takes too
long to ramp up from scratch.
SLOW START (ironically name)
is intended to solve that using multiplicative increase.
Source Destination
…
Algorithm: Each time source successfully sends a CongestionWindow
of packets, increase CongestionWindow by 1 packet (additive incr).
Divide CongestionWindow by 2 each timeout (multiplicative decr)
(never below Min Seg Size – MSS is in bytes – usually 1 packet)
60
20
1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0
KB
Time (seconds)
70
30
40
50
10
10.0
Slow Start (2nd mechanism provided by TCP)• Start with CongestionWindow (CW) = 1 packet
a slow start compared to a CongestionWindow=AdvertisedWindow start
• Double CongestionWindow each RTT (multiplicative incr)
until it reaches CongestionThreshold (CT), then increment by 1 per RTT.
Used when first starting connection and if connection goes deadwaiting for timeout (another “start over” situation).
Slow Start Trace:
60
20
1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0
KB
70
304050
10
timeouts
Hash marks =times when each packet is transmitted
time when retransmitted packetswere first transmitted
Time in sec
Source Destination
…
No increase; No Acks arriving – due to lost packets
Timeout; 17=CTCW/2; CW 0
mult increase
Multiplicative increase until CT, then Additive increase
No increase; No Acks arriving
Timeout; 11=CTCW/2; CW 0
Multiplicative increase until CT, then Additive increase
Fast RetransmitProblem: Coarse-grain TCP timeouts lead to idle periodsFast retransmit: use duplicate ACKs to trigger retrans.Idea: every time a packet arrives, receiver sends ACK.Thus, when a packet arrives out-of-order (and TCP can’tACK because earlier packets have not yet arrived)TCP resends last legit cumm ACK (called duplicate ACK).When sender sees 3 dups, retransmits next packet.
Trace of CongestionWindow with fast retransmit
Fast Recovery: Upon congestion, rather than drop back to 0 and use Slow Start, just
cut window in half and resume additive increase.
Packet 1Packet 2Packet 3Packet 4
Packet 5Packet 6
Retransmitpacket 3
ACK 1ACK 2
ACK 2ACK 2
ACK 6
ACK 2
Sender Receiver
60
20
1.0 2.0 3.0 4.0 5.0 6.0 7.0
KB
70
304050
10
timeout
Time in sec
Hash marks =times when each packet is transmitted
time when retransmitted packetswere first transmitted
Eliminates many of the flat areas where no packets were transmitted
Congestion Avoidance• TCP’s strategy is to control congestion once it happens(repeatedly increase load to find the point at which congestion occurs, and then back off)
• Alternative strategy
– predict when congestion is about to happen
– reduce rate before packets start being discarded
– call this congestion avoidance, instead of congestion control
• Two possibilities
– router-centric: DECbit and RED Gateways
– host-centric: TCP Vegas
DECbit• Add congestion bit to packet header.• Router
– monitors average queue length overlast busy-idle cycle + current busy cycle,
set congestion bit if average queue length > 1• End Host
– Destination echoes bit back to source– Source records how many packets resulted in set bit– If less than 50% of last CongestionWindow’s worth had bit set
• increase CongestionWindow by 1 packet– If 50% or more of last window’s worth had bit set
• decrease CongestionWindow to 7/8th of its value.
Random Early Dectection (RED)• Notification is implicit
– Router just drops the packet when congested (TCP will timeout)• Early random drop
– rather than wait for queue to become completely full, drop each arriving packet with some drop probability whenever the queue length exceeds some drop level
Currenttime
TimeCurrentcycle
Previouscycle
Averaginginterval
Queue length
RED DetailsCompute average queue lengthAvgLen = (1-Weight)*AvgLen+Weight*SampleLen 0 < Weight < 1 (usually 0.002) SampleLen = queue length each time packet arrives
Two queue length thresholdsif AvgLen MinThreshold then enqueue packetif MinThreshold < AvgLen < MaxThreshold then calculate probability P drop arriving packet with probability Pif MaxThreshold AvgLen, then drop arriving packet
Computing probability PTempP = MaxP * (AvgLen - MinThreshold)
(MaxThreshold - MinThreshold) Count = # packets (denom of AvgLen)
P = TempP/(1 - count * TempP)
MaxThreshold MinThreshold
AvgLen
Weighted runnng avg queue length
P(drop)
1.0
MaxP
MinThresh MaxThreshAvgLen
Drop probability curve
TCP Vegas (host-centric congestion avoidance)
Idea: source watches for some sign router’s queue is building (eg, RTT grows; sending rate flattens) ExpectedRate =CW/BaseRTT
Diff = ExpectedRate – ActualRate
if Diff < α increase CW linearly
else if Diff > β decrease CW linearly
else leave CW unchanged( when α < Diff < β )
min of all measured RTTs,Typically RTT of 1st packet
Source calculates current sending rate as the # bytes divided by the RTT for a distinguished packet
roughly corresponds to too little data in the network
roughly corresponds to too much data in the network
TCP Vegas (trace of congestion avoidance mechanism)
Parameters = 1 packet = 3 packets
70605040302010
KB
Time (seconds)
0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0
0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0
CA
M K
Bps
240200160120
8040
Time (seconds)
Congestion Window Trace for TCP Vegas
Actual throughput Expccted throughputShaded area is region between and units awayFrom the Expected throughput (the goal to keep actual inthis region. Note the actual gets drug along by shaded.)
QoSReal-time App• Require “deliver on time” assurances
– must come from inside the network (hosts cannot make such guarantees alone)
• Example application (audio)
– sample voice once every 125us
– each sample has a playback time
– packets experience variable delay in network
– add constant factor to playback time: playback point
• Playback Buffer
Microphone
Speaker
Sampler,A D
converter
Buffer,D A
Seq
uenc
e nu
mbe
r Packet generation
Network delayBuffer
Playback
Time
Packet arrival
Integrated Services• Refers to the body of work by IETF 1995-97 working group on Integrated Services.
• Integrated Services allocates resources to individual flows– whereas Differentiated Services allocates resources by “classes of traffic”
• Integrated Service Service Classes
– E.g., Guaranteed service (packets are never late – guaranteed max delay time)
• Flowspecs (Set of info we provide to the network to specify needs.)– Tspec
• describes flow’s Traffic characteristics (e.g., average bandwidth, token issues..)
– Rspec• describes the services Requested from the network
– E.g., guarantees, such as, delay target
RSVP Resource reSerVation Protocol• While connection-oriented networks have setup protocols,
best-effort connectionless networks don’t – they need some sort of reservation protocl in order to offer QoS.
– Internet resource reservation corresponds to signaling in ATM– Proposed Internet standard is called RSVP
• Receiver-oriented• 2 messages: PATH and RESV• Source transmits PATH messages
every 30 seconds to make requests.• Destination responds with
RESV message to ack.
R
R
R
R
R
Sender 1
Sender 2
PATH
PATH
RESV(merged)
RESV
RESV
Receiver B
Receiver A
RSVP versus ATM (Q.2931)
• RSVP– receiver generates reservation– soft state info used in routeers (it is refreshed/timedout)– separate from route establishment– QoS can change dynamically
• ATM– sender generates connection request– hard state info (requires explicit delete at teardown)– concurrent with route establishment– QoS is static for life of connection