Copyright, 1996 © Dale Carnegie & Associates, In Equilibrium & Dynamics of TCP/AQM Steven Low CS &...
-
date post
20-Dec-2015 -
Category
Documents
-
view
219 -
download
0
Transcript of Copyright, 1996 © Dale Carnegie & Associates, In Equilibrium & Dynamics of TCP/AQM Steven Low CS &...
Copyright, 1996 © Dale Carnegie & Associates, In
Equilibrium & Dynamics of TCP/AQM
Steven LowCS & EE, Caltech
netlab.caltech.edu
SigcommAugust 2001
Acknowledgments
S. Athuraliya, D. Lapsley, V. Li, Q. Yin (UMelb)
S. Adlakha (UCLA), J. Doyle (Caltech), F. Paganini (UCLA), J. Wang (Caltech)
L. Peterson, L. Wang (Princeton)Matthew Roughan (AT&T Labs)
Outline
IntroductionTCP/AQM algorithmsDuality model
Equilibrium of TCP/AQMLinear dynamic model
Dynamics of TCP/AQMA scalable control
Schedule
1:30 – 2:30 TCP/AQM algorithms2:30 – 3:00 Duality model (equilibrium)2:30 – 3:00 Break3:30 – 4:00 Duality model (equilibrium)4:00 – 4:30 Linear model (dynamic)4:30 – 5:00 Scalable control
Part 0Introduction
TCP/IP Protocol Stack
Applications (e.g. Telnet, HTTP)
TCP UDP ICMPARPIP
Link Layer (e.g. Ethernet, ATM)
Physical Layer (e.g. Ethernet, SONET)
Packet Terminology
Application Message
TCP dataTCP hdr
MSSTCP Segment
IP dataIP hdrIP Packet
Ethernet dataEthernet
Ethernet Frame
20 bytes
20 bytes
14 bytes 4 bytesMTU 1500 bytes
Success of IP
Applications
IP
Transmission
Simple/Robust Robustness against failure Robustness against technological
evolutions Provides a service to applications
Doesn’t tell applications what to do
Quality of Service Can we provide QoS with
simplicity? Not with current TCP… … but we can fix it!
WWW, Email, Napster, FTP, …
Ethernet, ATM, POS, WDM, …
IETF
Internet Engineering Task ForceStandards organisation for InternetPublishes RFCs - Requests For Comment
standards track: proposed, draft, Internetnon-standards track: experimental, informationalbest current practicepoetry/humour (RFC 1149: Standard for the
transmission of IP datagrams on avian carriers)
TCP should obey RFCno means of enforcementsome versions have not followed RFC
http://www.ietf.org/index.html
Simulation
ns-2: http://www.isi.edu/nsnam/ns/index.html Wide variety of protocolsWidely used/tested
SSFNET: http://www.ssfnet.org/homePage.htmlScalable to very large networks
Care should be taken in simulations!Multiple independent simulations
confidence intervals transient analysis – make sure simulations are long
enough
Wide parameter ranges
All simulations involve approximation
Other Tools
tcpdumpGet packet headers from real network
traffictcpanaly (V.Paxson, 1997)
Analysis of TCP sessions from tcpdumptraceroute
Find routing of packetsRFC 2398http://www.caida.org/tools/
Part IAlgorithms
Outline
Introduction TCP/AQM algorithms
Window flow control Source algorithm: Tahoe, Reno, Vegas Link algorithm: RED, REM
Duality modelEquilibrium of TCP/AQM
Linear dynamic modelDynamics of TCP/AQM
A scalable control
Early TCP
Pre-1988Go-back-N ARQ
Detects loss from timeoutRetransmits from lost packet onward
Receiver window flow controlPrevent overflows at receive buffer
Flow control: self-clocking
Why Flow Control?
October 1986, Internet had its first congestion collapse
Link LBL to UC Berkeley 400 yards, 3 hops, 32 Kbpsthroughput dropped to 40 bpsfactor of ~1000 drop!
1988, Van Jacobson proposed TCP flow control
Window Flow Control
~ W packets per RTTLost packet detected by missing ACK
RTT
time
time
Source
Destination
1 2 W
1 2 W
1 2 W
data ACKs
1 2 W
Source Rate
Limit the number of packets in the network to window W
Source rate = bps
If W too small then rate « capacityIf W too big then rate > capacity
=> congestion
RTT
MSSW
Effect of Congestion
Packet loss Retransmission Reduced throughput Congestion collapse due to
Unnecessarily retransmitted packets Undelivered or unusable packets
Congestion may continue after the overload!throughput
load
Congestion Control
TCP seeks toAchieve high utilizationAvoid congestionShare bandwidth
Window flow controlSource rate = packets/sec
Adapt W to network (and conditions)W = BW x RTT
RTT
W
TCP Window Flow Controls
Receiver flow controlAvoid overloading receiverSet by receiverawnd: receiver (advertised) window
Network flow controlAvoid overloading networkSet by sender Infer available network capacity cwnd: congestion window
Set W = min (cwnd, awnd)
Receiver Flow Control
Receiver advertises awnd with each ACK
Window awndclosed when data is received and ack’dopened when data is read
Size of awnd can be the performance limit (e.g. on a LAN)sensible default ~16kB
Network Flow Control
Source calculates cwnd from indication of network congestion
Congestion indicationsLosses DelayMarks
Algorithms to calculate cwndTahoe, Reno, Vegas, RED, REM …
Outline
Introduction TCP/AQM algorithms
Window flow control Source algorithm: Tahoe, Reno, Vegas Link algorithm: RED, REM
Duality modelEquilibrium of TCP/AQM
Linear dynamic modelDynamics of TCP/AQM
A scalable control
TCP Congestion Control Tahoe (Jacobson 1988)
Slow StartCongestion Avoidance Fast Retransmit
Reno (Jacobson 1990) Fast Recovery
Vegas (Brakmo & Peterson 1994)New Congestion Avoidance
RED (Floyd & Jacobson 1993) Probabilistic marking
REM (Athuraliya & Low 2000)Clear buffer, match rate
Others…
Variants
Tahoe & RenoNewRenoSACKRate-halvingMod.s for high performance
AQMRED, ARED, FRED, SREDBLUE, SFBREM, PI
TCP Tahoe (Jacobson 1988)
SStime
window
CA
SS: Slow StartCA: Congestion Avoidance
Slow Start
Start with cwnd = 1 (slow start)On each successful ACK increment
cwndcwnd cnwd + 1
Exponential growth of cwndeach RTT: cwnd 2 x cwnd
Enter CA when cwnd >= ssthresh
Slow Start
data packetACK
receiversender
1 RTT
cwnd1
2
34
5678
cwnd cwnd + 1 (for each ACK)
Congestion Avoidance
Starts when cwnd ssthreshOn each successful ACK:
cwnd cwnd + 1/cwndLinear growth of cwnd
each RTT: cwnd cwnd + 1
Congestion Avoidance
cwnd1
2
3
1 RTT
4
data packetACK
cwnd cwnd + 1 (for each cwnd ACKS)
receiversender
Packet Loss
Assumption: loss indicates congestionPacket loss detected by
Retransmission TimeOuts (RTO timer)Duplicate ACKs (at least 3)
1 2 3 4 5 6
1 2 3
Packets
Acknowledgements
3 3
7
3
Fast Retransmit
Wait for a timeout is quite longImmediately retransmits after 3
dupACKs without waiting for timeoutAdjusts ssthresh
flightsize = min(awnd, cwnd)ssthresh max(flightsize/2, 2)
Enter Slow Start (cwnd = 1)
Successive Timeouts
When there is a timeout, double the RTOKeep doing so for each lost
retransmissionExponential back-offMax 64 seconds1
Max 12 restransmits1
1 - Net/3 BSD
Summary: Tahoe
Basic ideasGently probe network for spare capacityDrastically reduce rate on congestionWindowing: self-clockingOther functions: round trip time estimation,
error recoveryfor every ACK { if (W < ssthresh) then W++ (SS) else W += 1/W (CA)}for every loss {
ssthresh = W/2 W = 1 }
TCP Tahoe (Jacobson 1988)
SStime
window
CA
SS: Slow StartCA: Congestion Avoidance
TCP Reno (Jacobson 1990)
SStime
window
CA
SS: Slow StartCA: Congestion Avoidance Fast retransmission/fast recovery
Fast recovery
Motivation: prevent `pipe’ from emptying after fast retransmit
Idea: each dupACK represents a packet having left the pipe (successfully received)
Enter FR/FR after 3 dupACKsSet ssthresh max(flightsize/2, 2)Retransmit lost packetSet cwnd ssthresh + ndup (window inflation)Wait till W=min(awnd, cwnd) is large enough;
transmit new packet(s)On non-dup ACK (1 RTT later), set cwnd
ssthresh (window deflation) Enter CA
9
94
0 0
Example: FR/FR
Fast retransmitRetransmit on 3 dupACKs
Fast recoveryInflate window while repairing loss to fill pipe
timeS
timeR
1 2 3 4 5 6 87
8
cwnd 8ssthresh
1
74
0 0 0
Exit FR/FR
44
411
00
1011
Summary: Reno
Basic ideasFast recovery avoids slow startdupACKs: fast retransmit + fast recoveryTimeout: fast retransmit + slow start
slow start retransmit
congestion avoidance FR/FR
dupACKs
timeout
NewReno: Motivation
On 3 dupACKs, receiver has packets 2, 4, 6, 8, cwnd=8, retransmits pkt 1, enter FR/FR
Next dupACK increment cwnd to 9 After a RTT, ACK arrives for pkts 1 & 2, exit FR/FR,
cwnd=5, 8 unack’ed pkts No more ACK, sender must wait for timeout
1 2timeS
timeD
3 4 5 6 87 1
8
FR/FR
09
0 0 0 0 0
9
8 unack’d pkts
2
5
3
timeout
NewReno Fall & Floyd ‘96, (RFC 2583)
Motivation: multiple losses within a windowPartial ACK acknowledges some but not all
packets outstanding at start of FRPartial ACK takes Reno out of FR, deflates windowSender may have to wait for timeout before
proceeding
Idea: partial ACK indicates lost packetsStays in FR/FR and retransmits immediatelyRetransmits 1 lost packet per RTT until all lost
packets from that window are retransmittedEliminates timeout
SACK Mathis, Mahdavi, Floyd, Romanow ’96 (RFC 2018, RFC 2883)
Motivation: Reno & NewReno retransmit at most 1 lost packet per RTTPipe can be emptied during FR/FR with multiple
losses Idea: SACK provides better estimate of packets
in pipeSACK TCP option describes received packetsOn 3 dupACKs: retransmits, halves window, enters
FRUpdates pipe = packets in pipe
Increment when lost or new packets sentDecrement when dupACK received
Transmits a (lost or new) packet when pipe < cwndExit FR when all packets outstanding when FR was
entered are acknowledged
Outline
Introduction TCP/AQM algorithms
Window flow control Source algorithm: Tahoe, Reno, Vegas Link algorithm: RED, REM
Duality modelEquilibrium of TCP/AQM
Linear dynamic modelDynamics of TCP/AQM
A scalable control
TCP Reno (Jacobson 1990)
SStime
window
CA
SS: Slow StartCA: Congestion Avoidance Fast retransmission/fast recovery
TCP Vegas (Brakmo & Peterson 1994)
SStime
window
CA
Converges, no retransmission… provided buffer is large enough
queue size
for every RTT
{ if W/RTTmin – W/RTT < then W ++
if W/RTTmin – W/RTT > then W -- }
for every loss
W := W/2
Vegas CA algorithm
Implications
Congestion measure = end-to-end queueing delay
At equilibriumZero lossStable window at full utilizationApproximately weighted proportional
fairnessNonzero queue, larger for more sources
Convergence to equilibriumConverges if sufficient network bufferOscillates like Reno otherwise
Outline
Introduction TCP/AQM algorithms
Window flow control Source algorithm: Tahoe, Reno, Vegas Link algorithm: RED, REM
Duality modelEquilibrium of TCP/AQM
Linear dynamic modelDynamics of TCP/AQM
A scalable control
RED (Floyd & Jacobson 1993)
Idea: warn sources of incipient congestion by probabilistically marking/dropping packets
Link algorithm to work with source algorithm (Reno)
Bonus: desynchronization Prevent bursty loss with buffer overflows
time
window
BAvg queue
marking
1
router host
RED
ImplementationProbabilistically drop packetsProbabilistically mark packetsMarking requires ECN bit (RFC 2481)
PerformanceDesynchronization works wellExtremely sensitive to parameter settingFail to prevent buffer overflow as
#sources increases
Variant: ARED (Feng, Kandlur, Saha, Shin 1999)
Motivation: RED extremely sensitive to #sources
Idea: adapt maxp to loadIf avg. queue < minth, decrease maxp
If avg. queue > maxth, increase maxp
No per-flow information needed
Variant: FRED (Ling & Morris 1997)
Motivation: marking packets in proportion to flow rate is unfair (e.g., adaptive vs unadaptive flows)
IdeaA flow can buffer up to minq packets without being
markedA flow that frequently buffers more than maxq packets
gets penalizedAll flows with backlogs in between are marked
according to REDNo flow can buffer more than avgcq packets
persistently
Need per-active-flow accounting
Variant: SRED (Ott, Lakshman & Wong 1999)
Motivation: wild oscillation of queue in RED when load changes
Idea:Estimate number N of active flows
An arrival packet is compared with a randomly chosen active flows
N ~ prob(Hit)-1
cwnd~p-1/2 and Np-1/2 = Q0 implies p = (N/Q0)2
Marking prob = m(q) min(1, p)No per-flow information needed
Variant: BLUE (Feng, Kandlur, Saha, Shin 1999)
Motivation: wild oscillation of RED leads to cyclic overflow & underutilization
AlgorithmOn buffer overflow, increment marking
probOn link idle, decrement marking prob
Variant: SFBMotivation: protection against nonadaptive flows Algorithm
L hash functions map a packet to L bins (out of NxL )Marking probability associated with each bin is
Incremented if bin occupancy exceeds thresholdDecremented if bin occupancy is 0
Packets marked with min {p1, …, pL}
1
1
1 1 nonadaptive
adaptive
h1 hLhL-1h2
Variant: SFB
IdeaA nonadaptive flow drives marking prob
to 1 at all L bins it is mapped toAn adaptive flow may share some of its L
bins with nonadaptive flowsNonadaptive flows can be identified and
penalized
0 2 4 6 8 10 12 14 16 18 20
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Link congestion measure
Lin
k m
ark
ing p
robability
REM Athuraliya & Low 2000
Main ideasDecouple congestion & performance
measurePrice adjusted to match rate and clear
bufferMarking probability exponential in `price’
REM RED
Avg queue
1
Part IIModels
Congestion Control
Heavy tail Mice-elephants
Elephant
Internet
Mice
Congestion control
efficient & fair sharing
small delay
queueing + propagation
CDN
TCP
xi(t)
TCP & AQM
xi(t)
pl(t)
Example congestion measure pl(t)
Loss (DropTail)Queue length (RED)Queueing delay (Vegas)Price (REM)
TCP & AQM
xi(t)
pl(t)
Duality theory equilibrium Source rates xi(t) are primal variablesCongestion measures pl(t) are dual variablesCongestion control is optimization process
over Internet
TCP & AQM
xi(t)
pl(t)
Control theory stability & robustness Internet is a gigantic feedback systemDistributed & delayed
Outline
IntroductionTCP/AQM algorithmsDuality model
Equilibrium of TCP/AQMLinear dynamic model
Dynamics of TCP/AQMA scalable control
Overview: equilibrium
Interaction of source rates xs(t) and congestion measures pl(t)
Duality theory They are primal and dual variables Flow control is optimization process
Example congestion measure Loss (Reno) Queueing delay (Vegas)
Overview: equilibrium
Llcx
xU
l
l
sss
xs
, subject to
)( max0
Congestion control problem
TCP/AQM protocols (F, G)Maximize aggregate source utilityWith different utility functions Us(xs)
Primal-dual algorithm
))( ),(( )1(
))( ),(( )1(
txtpGtp
txtpFtx
Reno, Vegas
DropTail, RED, REM
Overview: Reno
Equilibrium characterization
Duality
Congestion measure p = loss Implications
Reno equalizes window wi = i xi
inversely proportional to delay i
dependence for small p DropTail fills queue, regardless of queue capacity
2tan
2)( 1 ii
is
renos
xxU
p1
2
ii
iq
x
ii
i
i qxq
2
)1( 2
2
Overview: AQM
DropTailqueue = 94%
REDmin_th = 10 pktsmax_th = 40 pktsmax_p = 0.1
REM
queue = 1.5 pktsutilization = 92% = 0.05, = 0.4, = 1.15
Decouple congestion & performance measure
Overview: equilibrium
DropTailHigh utilization Fills buffer, maximize queueing delay
REDCouples queue length with congestion measureHigh utilization or low delay
REMDecouple performance & congestion measureMatch rate, clear bufferSum prices
Overview: VegasDelay
Congestion measures: end to end queueing delay
Sets rate
Equilibrium condition: Little’s Law
LossNo loss if converge (with sufficient buffer)Otherwise: revert to Reno (loss unavoidable)
FairnessUtility functionProportional fairness
)()(
tq
dtx
s
sss
ssss
reno
sxdxU log)(
Model
c1 c2
Sources sL(s) - links used by source sUs(xs) - utility if source rate = xs
Network Links l of capacities cl
x1
x2
x3
121 cxx 231 cxx
Primal problem
Llcx
xU
l
l
sss
xs
, subject to
)( max0
AssumptionsStrictly concave increasing Us
Unique optimal rates xs existDirect solution impractical
Prior Work
FormulationKelly 1997
Penalty function approachKelly, Maulloo and Tan 1998Kunniyur and Srikant 2000
Duality approach Low and Lapsley 1999Athuraliya and Low 2000, Low 2000
ExtensionsMo & Walrand 1998 La & Anantharam 2000
Prior Work
FormulationKelly 1997
Penalty function approachKelly, Maulloo and Tan 1998Kunniyur and Srikant 2000
Duality approach Low and Lapsley 1999Athuraliya and Low 2000, Low 2000
ExtensionsMo & Walrand 1998 La & Anantharam 2000
Example
1
1 subject to
log max
31
21
0
xx
xx
xs
sxs
Lagrange multiplier: p1 = p2 = 3/2
1 1
x1
x2x3
Optimal: x1 = 1/3
x2 = x3 = 2/3
xs : proportionally fair
pl : Lagrange multiplier, (shadow) price, congestion measure
How to compute (x, p)? Relevance to TCP/AQM ??
Example
1 1
x1
x2x3
1 1
x1
x2x3
xs : proportionally fair (Vegas)
pl : Lagrange multiplier, (shadow) price, congestion measure
How to compute (x, p)? Gradient algorithms, Newton algorithm, Primal-dual algorithms…
Relevance to TCP/AQM ?? TCP/AQM protocols implement primal-dual algorithms over Internet
Example
;)(
1)1( ;
)(
1)1(
;)()(
1)1(
23
12
211
tptx
tptx
tptptx
Aggregate rate
)1)()(()()1(
)1)()(()()1(
3122
2111
txtxtptp
txtxtptp
xs : proportionally fair (Vegas)
pl : Lagrange multiplier, (shadow) price, congestion measure
How to compute (x, p)? Gradient algorithms, Newton algorithm, Primal-dual algorithms…
Relevance to TCP/AQM ?? TCP/AQM protocols implement primal-dual algorithms over Internet
Example
1 1
x1
x2x3
Duality Approach
))( ),(( )1(
))( ),(( )1(
txtpGtp
txtpFtx
Primal-dual algorithm:
)( )( max )( min
, subject to )( max
00
0
:Dual
:Primal
ll
ll
sss
xp
ll
sss
x
xcpxUpD
LlcxxU
s
s
Duality Model of TCP
))( ),(( )1(
))( ),(( )1(
txtpGtp
txtpFtx
Primal-dual algorithm:
Reno, Vegas
DropTail, RED, REM
TCP iterates on rates (windows) AQM iterates on congestion measures With different utility functions
Summary
Llcx
xU
l
l
sss
xs
, subject to
)( max0
Congestion control problem
TCP/AQM protocols (F, G)Maximize aggregate source utilityWith different utility functions Us(xs)
Primal-dual algorithm
))( ),(( )1(
))( ),(( )1(
txtpGtp
txtpFtx
Reno, Vegas
DropTail, RED, REM
(F, G, U) model
DerivationDerive (F, G) from protocol descriptionFix point (x, p) = (F, G) gives equilibriumDerive U
regard fixed point as Kuhn-Tucker condition
Application: equilibrium propertiesPerformance
Throughput, loss, delay, queue length Fairness, friendliness
Outline
IntroductionTCP/AQM algorithmsDuality model (F, G, U)
Queue management G : RED, REMTCP F and U : Reno, Vegas
Linear dynamic modelDynamics of TCP/AQM
A scalable control
Active queue management
Idea: provide congestion information by probabilistically marking packets
IssuesHow to measure congestion (p and G)?How to embed congestion measure? How to feed back congestion info?
x(t+1) = F( p(t), x(t) )
p(t+1) = G( p(t), x(t) )
Reno, Vegas
DropTail, RED, REM
RED (Floyd & Jacobson 1993)
Congestion measure: average queue length pl(t+1) = [pl(t) + xl(t) - cl]+
Embedding: p-linear probability function
Feedback: dropping or ECN marking
Avg queue
marking
1
REM (Athuraliya & Low 2000)
Congestion measure: pricepl(t+1) = [pl(t) + (l bl(t)+ xl
(t) - cl )]+
Embedding:
Feedback: dropping or ECN marking
REM (Athuraliya & Low 2000)
Congestion measure: pricepl(t+1) = [pl(t) + (l bl(t)+ xl
(t) - cl )]+
Embedding: exponential probability function
Feedback: dropping or ECN marking
0 2 4 6 8 10 12 14 16 18 200
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Link congestion measure
Lin
k m
arkin
g probability
Match rate
Clear buffer and match rate
Clear buffer
Key features
)] )(ˆ )( ()([ )1( ll
llll ctxtbtptp
)()( 1 1 tptp sl
Sum prices
Theorem (Paganini 2000)
Global asymptotic stability for general utility function (in the absence of delay)
Active Queue Management
pl(t) G(p(t), x(t))
DropTail loss [1 - cl/xl (t)]+ (?)
RED queue [pl(t) + xl(t) - cl]+
Vegas delay [pl(t) + xl (t)/cl - 1]+
REM price [pl(t) + lbl(t)+ xl (t) - cl )]
+
x(t+1) = F( p(t), x(t) )
p(t+1) = G( p(t), x(t) )
Reno, Vegas
DropTail, RED, REM
Congestion & performance
pl(t) G(p(t), x(t))
DropTail loss [1 - cl/xl (t)]+ (?)
RED queue [pl(t) + xl(t) - cl]+
Vegas delay [pl(t) + xl (t)/cl - 1]+
REM price [pl(t) + lbl(t)+ xl (t) - cl )]
+
Decouple congestion & performance measure RED: `congestion’ = `bad performance’ REM: `congestion’ = `demand exceeds supply’
But performance remains good!
Outline
IntroductionTCP/AQM algorithmsDuality model (F, G, U)
Queue management G : RED, REMTCP F and U : Reno, Vegas
Linear dynamic modelDynamics of TCP/AQM
A scalable control
Utility functions
Reno
Vegas
2tan
2)( 1 ii
ii
renoi
xxU
iiiivegasi xdxU log)(
)()(2
)(
))(1)(( tptx
tw
w
tptxtw s
s
s
ss
x(t+1) = F( p(t), x(t) )
p(t+1) = G( p(t), x(t) )
Reno: F
Primal-dual algorithm:
Reno, Vegas
DropTail, RED, REM
for every ack (ca){ W += 1/W }for every loss{ W := W/2 }
)()(2
)(
))(1)(( tptx
tw
w
tptxtw s
s
s
ss
x(t+1) = F( p(t), x(t) )
p(t+1) = G( p(t), x(t) )
Reno: F
Primal-dual algorithm:
Reno, Vegas
DropTail, RED, REM
for every ack (ca){ W += 1/W }for every loss{ W := W/2 }
)()(2
)(
))(1)(( tptx
tw
w
tptxtw s
s
s
ss
x(t+1) = F( p(t), x(t) )
p(t+1) = G( p(t), x(t) )
Reno: F
Primal-dual algorithm:
Reno, Vegas
DropTail, RED, REM
for every ack (ca){ W += 1/W }for every loss{ W := W/2 }
)(2
)(
))(1( )()(),(
2
2tp
tx
D
tptxtxtpF s
sss
Implications
Equilibrium characterization
Duality
Congestion measure p = loss Implications
Reno equalizes window wi = i xi
inversely proportional to delay i
dependence for small p DropTail fills queue, regardless of queue capacity
2tan
2)( 1 ii
is
renos
xxU
p1
2
ii
iq
x
ii
i
i qxq
2
)1( 2
2
Validation - Reno
30 sources, 3 groups with RTT = 3, 5, 7ms + 6ms (queueing delay) Link capacity = 64 Mbps, buffer = 50 kB
Measured windows equalized, match well with theory (black line)
Validation – Reno/RED
30 sources, 3 groups with RTT = 3, 5, 7 ms + 6 ms (queueing delay) Link capacity = 64 Mbps, buffer = 50 kB
Validation – Reno/REM
30 sources, 3 groups with RTT = 3, 5, 7 ms Link capacity = 64 Mbps, buffer = 50 kB Smaller window due to small RTT (~0 queueing delay)
Queue
DropTailqueue = 94%
REDmin_th = 10 pktsmax_th = 40 pktsmax_p = 0.1
p = Lagrange multiplier!
p increasing in queue!
REM
queue = 1.5 pktsutilization = 92% = 0.05, = 0.4, = 1.15
p decoupled from queue
Summary
DropTailHigh utilization Fills buffer, maximize queueing delay
REDCouples queue length with congestion measureHigh utilization or low delay
REMDecouple performance & congestion measureMatch rate, clear bufferSum prices
Gradient algorithm
Gradient algorithm
))(( )1( : source 1' tqUtx iii
)])(()([ )1( :link llll ctytptp
Theorem (ToN’99) Converge to optimal rates in asynchronousenvironment
Reno & gradient algorithm
Gradient algorithm
))(( )1( : source 1' tqUtx iii
)])(()([ )1( :link llll ctytptp
)(2
)(
))(1( )()(),(
2
2tq
txtqtxtxtqF i
i
i
iiiii
TCP approximate version of gradient algorithm
Reno & gradient algorithm
Gradient algorithm
))(( )1( : source 1' tqUtx iii
)])(()([ )1( :link llll ctytptp
TCP approximate version of gradient algorithm
))()((
2
)( )(1 22 txtx
tqtxtx ii
iii
))(( 1' tqU ii
Outline
IntroductionTCP/AQM algorithmsDuality model (F, G, U)
Queue management G : RED, REMTCP F and U : Reno, Vegas
Linear dynamic modelDynamics of TCP/AQM
A scalable control
queue size
for every RTT
{ if W/RTTmin – W/RTT < then W ++
if W/RTTmin – W/RTT > then W -- }
for every loss
W := W/2
Vegas
ssssss
ss dtxdtwD
txtx
)()( if 1
)(12
else )(1 txtx ss
ssssss
ss dtxdtwD
txtx
)()( if 1
)(12
F:
pl(t+1) = [pl(t) + xl (t)/cl - 1]+G:
Implications
PerformanceRate, delay, queue, loss
InteractionFairnessTCP-friendliness
Utility function
Performance
DelayCongestion measures: end to end
queueing delay
Sets rate
Equilibrium condition: Little’s LawLoss
No loss if converge (with sufficient buffer)Otherwise: revert to Reno (loss
unavoidable)
)()(
tq
dtx
s
sss
Vegas Utility
ssss
reno
sxdxU log)(
Equilibrium (x, p) = (F, G)
Proportional fairness
Vegas & Basic Algorithm
Basic algorithm))(( )1( : source 1' tpUtx ss
TCP smoothed version of Basic Algorithm …
Vegas & Gradient Algorithm
Basic algorithm))(( )1( : source 1' tpUtx ss
TCP smoothed version of Basic Algorithm …
Vegas
))(( 1' tpU s
)()( if 1
)(12
txtxD
txtx sss
ss
else )(1 txtx ss
)()( if 1
)(12
txtxD
txtx sss
ss
Validation
Source 1 Source 3 Source 5
RTT (ms) 17.1 (17) 21.9 (22) 41.9 (42) Rate (pkts/s) 1205 (1200) 1228 (1200) 1161 (1200)Window (pkts) 20.5 (20.4) 27 (26.4) 49.8 (50.4)Avg backlog (pkts) 9.8 (10)
Single link, capacity = 6 pkts/ms 5 sources with different propagation delays, s = 2
pkts/RTT
meausred theory
Persistent congestion
Vegas exploits buffer process to compute prices (queueing delays)
Persistent congestion due toCoupling of buffer & priceError in propagation delay estimation
ConsequencesExcessive backlogUnfairness to older sources
TheoremA relative error of s in propagation delay estimation distorts the utility function to
sssssssss xdxdxU log)1()(ˆ
Evidence
Single link, capacity = 6 pkt/ms, s = 2 pkts/ms, ds = 10 ms
With finite buffer: Vegas reverts to Reno
Without estimation error With estimation error
Evidence
Source rates (pkts/ms)# src1 src2 src3 src4 src51 5.98 (6) 2 2.05 (2) 3.92 (4)3 0.96 (0.94) 1.46 (1.49) 3.54 (3.57)4 0.51 (0.50) 0.72 (0.73) 1.34 (1.35) 3.38 (3.39)5 0.29 (0.29) 0.40 (0.40) 0.68 (0.67) 1.30 (1.30) 3.28
(3.34) # queue (pkts) baseRTT (ms)1 19.8 (20) 10.18 (10.18)2 59.0 (60) 13.36 (13.51)3 127.3 (127) 20.17 (20.28)4 237.5 (238) 31.50 (31.50)5 416.3 (416) 49.86 (49.80)
end2end queueing delay
Vegas/REM
To preserve Vegas utility function & rates
REMClear buffer : estimate of ds
Sum prices : estimate of ps
Vegas/REM
)(ˆ)( if 1
)(12
txtxD
txtx sss
ss
else )(1 txtx ss
)(ˆ)( if 1
)(12
txtxD
txtx sss
ss
s
s
qd
ssx
end2end queueing delay
Vegas/REM
To preserve Vegas utility function & rates
REMClear buffer : estimate of ds
Sum prices : estimate of ps
Vegas/REM
)(ˆ)( if 1
)(12
txtxD
txtx sss
ss
else )(1 txtx ss
)(ˆ)( if 1
)(12
txtxD
txtx sss
ss
s
s
qd
ssx
ss
s
dDd
ssx
Vegas/REM
To preserve Vegas utility function & rates
REMClear buffer : estimate of ds
Sum prices : estimate of ps
Vegas/REM
)(ˆ)( if 1
)(12
txtxD
txtx sss
ss
else )(1 txtx ss
)(ˆ)( if 1
)(12
txtxD
txtx sss
ss
end2end pricess
p
dssx
Vegas/REM
To preserve Vegas utility function & rates
REMClear buffer : estimate of ds
Sum prices : estimate of ps
Vegas/REM
)(ˆ)( if 1
)(12
txtxD
txtx sss
ss
else )(1 txtx ss
)(ˆ)( if 1
)(12
txtxD
txtx sss
ss
ss
p
dssx
end2end price
Performance
Vegas/REM Vegas
peak = 43 pktsutilization : 90% - 96%
Conclusion
))( ),(( )1(
))( ),(( )1(
txtpGtp
txtpFtx
Duality model of TCP: (F, G, U)
Reno, Vegas Maximize aggregate
utility With different utility
functionsDropTail, RED, REM
Decouple congestion & performance
Match rate, clear buffer Sum prices
Food for thought
How to tailor utility to application?Choosing congestion control
automatically fixes utility functionCan use utility function to determine
congestion control
Outline
IntroductionTCP/AQM algorithmsDuality model (F, G, U)
Equilibrium of TCP/AQMLinear dynamic model
Stability of TCP/AQMA scalable control
Model assumptions
Small marking probabilitiesEnd to end marking probability
Congestion avoidance dominatesReceiver not limitingDecentralized
TCP algorithm depends only on end-to-end measure of congestion
AQM algorithm depends only on local & aggregate rate or queue
Constant (equilibrium) RTT
l
ll
l pp )1(1
Dynamics
Small effect on queueAIMDMice trafficHeterogeneity
Big effect on queueStability!
Stable: 20ms delay
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000
10
20
30
40
50
60
70Window
time (ms)
Win
dow
(pk
ts)
individual window
Window
Ns-2 simulations, 50 identical FTP sources, single link 9 pkts/ms, RED marking
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000
100
200
300
400
500
600
700
800Instantaneous queue
time (ms)
Inst
anta
neou
s qu
eue
(pkt
s)
Queue
Stable: 20ms delay
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000
10
20
30
40
50
60
70Window
time (ms)
Win
dow
(pk
ts)
individual window
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000
10
20
30
40
50
60
70Window
time (ms)
Win
dow
(pk
ts)
individual window
average window
Window
Ns-2 simulations, 50 identical FTP sources, single link 9 pkts/ms, RED marking
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000
10
20
30
40
50
60
70Window
time (10ms)
Win
dow
(pk
ts)
individual window
Unstable: 200ms delay
Window
Ns-2 simulations, 50 identical FTP sources, single link 9 pkts/ms, RED marking
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000
10
20
30
40
50
60
70Window
time (10ms)
Win
dow
(pk
ts)
individual window
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000
10
20
30
40
50
60
70Window
time (10ms)
Win
dow
(pk
ts)
individual window
average window
Unstable: 200ms delay
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000
100
200
300
400
500
600
700
800Instantaneous queue
time (10ms)
Inst
anta
neou
s qu
eue
(pkt
s)
QueueWindow
Ns-2 simulations, 50 identical FTP sources, single link 9 pkts/ms, RED marking
Other effects on queue
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000
100
200
300
400
500
600
700
800Instantaneous queue
time (ms)
Inst
anta
neou
s qu
eue
(pkt
s)
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000
100
200
300
400
500
600
700
800Instantaneous queue
time (10ms)
Inst
anta
neou
s qu
eue
(pkt
s)
20ms
200ms
0 10 20 30 40 50 60 70 80 90 1000
100
200
300
400
500
600
700
800Instantaneous queue (50% noise)
time (sec)
inst
anta
neou
s qu
eue
(pkt
s)
50% noise
0 10 20 30 40 50 60 70 80 90 1000
100
200
300
400
500
600
700
800Instantaneous queue (50% noise)
time (sec)
inst
anta
neou
s qu
eue
(pkt
s)
50% noise
0 10 20 30 40 50 60 70 80 90 1000
100
200
300
400
500
600
700
800
time (sec)
Instantaneous queue (pkts)
inst
anta
neou
s qu
eue
(pkt
s)
avg delay 16ms
0 10 20 30 40 50 60 70 80 90 1000
100
200
300
400
500
600
700
800
time (sec)
Instantaneous queue (pkts)
inst
anta
neou
s qu
eue
(pkt
s)
avg delay 208ms
Effect of instability
Larger jitters in throughputLower utilization
20 40 60 80 100 120 140 160 180 2000
20
40
60
80
100
120
140
160
delay (ms)
win
dow
Mean and variance of individual window
mean
variance
No noise
Linearized system
F1
FN
G1
GL
Rf(s)
Rb’(s)
TCP Network AQM
x y
q p
))(),(),((~
:)(),( tytptzGtytpG lllllll
)(2
)(
))(1( )()(),(
2
2tq
txtqtxtxtqF i
i
i
iiiii
Linearized system
F1
FN
G1
GL
Rf(s)
Rb’(s)
x y
q p
s
n
s
n
ni ii
i
n
ee
xc
scs
c
wpspsL
1
1
)(
1)(
***
TCP RED queue delay
-10 -5 0 5 10 15-10
-8
-6
-4
-2
0
2Nyquist plot of h(v, theta)
Im
Re
Stability condition
TheoremTCP/RED stable if
1
|| 33 Hc
-10 -5 0 5 10 15-10
-8
-6
-4
-2
0
2Nyquist plot of h(v, theta)
Im
Re
Stability condition
TheoremTCP/RED stable if
1
|| 33 Hc
Small Slow
response Large delay
-10 -5 0 5 10 15-10
-8
-6
-4
-2
0
2Nyquist plot of h(v, theta)
Im
Re
Stability condition
TheoremTCP/RED stable if
1
|| 33 Hc
20 30 40 50 60 70 80 90 1000
0.5
1
1.5
2
2.5
3
3.5
4
4.5x 10
10 Stability condition
LHS
of s
tabi
lity
cond
ition
delay (ms)
c=50 pkts/ms
c=10
c=20
c=30
c=40
N=200
Validation
50 55 60 65 70 75 80 85 90 95 10050
55
60
65
70
75
80
85
90
95
100Round trip propagation delay at critical frequency (ms)
delay (NS)
dela
y (m
odel
)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5Critical frequency (Hz)
frequency (NS)
frequ
ency
(mod
el)
dynamic-link model
30 data points
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5Critical frequency (Hz)
frequency (NS)
frequ
ency
(mod
el)
dynamic-link model
30 data points
Validation
50 55 60 65 70 75 80 85 90 95 10050
55
60
65
70
75
80
85
90
95
100Round trip propagation delay at critical frequency (ms)
delay (NS)
dela
y (m
odel
)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5Critical frequency (Hz)
frequency (NS)
frequ
ency
(mod
el)
dynamic-link model
static-link model
30 data points
Stability region
8 9 10 11 12 13 14 1550
55
60
65
70
75
80
85
90
95
100Round trip propagation delay at critical frequency
capacity (pkts/ms)
dela
y (
ms)
N=40
N=30
N=20
N=20 N=60
Unstable for Large delay Large
capacity Small load
Role of AQM
(Sufficient) stability condition
-1
} );( {co HK
Role of AQM
(Sufficient) stability condition
-1
} );( {co HK
TCP:N
c
2
22**wpj
e j
AQM: scale down K & shape H
Role of AQM
} );( {co HK
TCP:N
c
2
22**wpj
e j
RED:
1
c
cj
e j
1
REM/PI:1
2a
j
aaje j21 /
Queue dynamics (windowing)
Problem!!
Scalable ControlREM
Scalable control??
xi(t)
pl(t)
Stability Utilization Delay
Outline
IntroductionTCP/AQM algorithmsDuality model (F, G, U)
Equilibrium of TCP/AQMLinear dynamic model
Stability of TCP/AQMA scalable control
Delay compensation
EquilibriumHigh utilization: itegrator at links Low loss & delay: VQ, REM/PI
Stability Integrator+network delay always unstable for
high !
Source Network Link
se
ss 1
Delay compensation
EquilibriumHigh utilization: itegrator at links Low loss & delay: REM, PI (HMTG01a)
Stability Integrator+network delay always unstable for high !
Delay invarianceScale down gain by (known at sources)This is self-clocking!
Source Network Link
se s 1
Compensation for capacity
Speed of adaptation increases withLink capacity#bottleneck links in path
Capacity invarianceScale down by capacity cl at links
Scale up by rate xi(t) at sources
0 1i
i
x
c
Scalable control (Paganini, Doyle, Low 01)
Theorem (Paganini, Doyle, Low 2001) Provided R is full rank, feedback loop is locally stable for arbitrary delay, capacity, load and topology
lll
l
tqm
ii
ctyc
tp
extxi
ii
i
)(1
)(
)()(
AQM:
TCP:
Scalable control (Paganini, Doyle, Low 01)
Theorem (Paganini, Doyle, Low 2001) Provided R is full rank, feedback loop is locally stable for arbitrary delay, capacity, load and topology
lll
l
tqm
ii
ctyc
tp
extxi
ii
i
)(1
)(
)()(
AQM:
TCP:
0 2 4 6 8 10 12 14 16 18 200
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Link congestion measure
Lin
k m
arkin
g probability
Utility function
0 5 10 150
2
4
6
8
10
12
14
16
18
20
time (sec)
Indi
vidu
al w
indo
w (
pkts
)
Individual window
Stability
40ms, window
0 2 4 6 8 10 12 140
50
100
150
200
250
300
350
400
450
500
time (sec)
inst
anta
neou
s qu
eue
(pkt
s)
Instantaneous queue
queue
0 5 10 150
100
200
300
400
500
600
700
800
900
1000
time (sec)
inst
anta
neou
s qu
eue
(pkt
s)
Instantaneous queue
50, 40ms
54% noise
0 5 10 150
20
40
60
80
100
120
140
160
time (sec)
Indi
vidu
al w
indo
w (
pkts
)
Individual window
200ms, window
0 2 4 6 8 10 12 140
1000
2000
3000
4000
5000
6000
7000
time (sec)
inst
anta
neou
s qu
eue
(pkt
s)
Instantaneous queue
queue
0 5 10 15 200
1000
2000
3000
4000
5000
6000
7000
time (sec)
inst
anta
neou
s qu
eue
(pkt
s)
Instantaneous queue
56% noise
Papers
Scalable laws for stable network congestion control (CDC2001)
A duality model of TCP flow controls (ITC, Sept 2000)
Optimization flow control, I: basic algorithm & convergence (ToN, 7(6), Dec 1999)
Understanding Vegas: a duality model (ACM Sigmetrics, June 2000)
REM: active queue management (Network, May/June 2001)
netlab.caltech.edu
The End
Backup slides
Law
Equilibrium window size
Equilibrium rate
Empirically constant a ~ 1 Verified extensively through simulations and
on Internet References
T.J.Ott, J.H.B. Kemperman and M.Mathis (1996)M.Mathis, J.Semke, J.Mahdavi, T.Ott (1997)T.V.Lakshman and U.Mahdow (1997) J.Padhye, V.Firoiu, D.Towsley, J.Kurose (1998) J.Padhye, V.Firoiu, D.Towsley (1999)
p1
p
aw
s
pD
ax
s
s
Implications
ApplicabilityAdditive increase multiplicative decrease
(Reno)Congestion avoidance dominatesNo timeouts, e.g., SACK+RHSmall lossesPersistent, greedy sourcesReceiver not bottleneck
ImplicationsReno equalizes windowReno discriminates against long connections
Area = 2w2/3
Derivation (I)
Each cycle delivers 2w2/3 packets Assume: each cycle delivers 1/p packets
Delivers 1/p packets followed by a drop Loss probability = p/(1+p) ~ p if p is small.
Hence
t
window
2w/3
w = (4w/3+2w/3)/2
4w/3
2w/3
pw 2/3
Derivation (II)
Assume: loss occurs as Bernoulli process rate p Assume: spend most time in CA Assume: p is smallwn is the window size after nth RTT
))1( (prob.lost ispacket no if,1
) (prob.lost ispacket a if,2/1
nn
nnn pww
pwww
pw
pw
wpwwpw
w
2
2
)1)(1(2
2
Simulations
Renewal model including FR/FR with Delayed ACKs (b packets per ACK)TimeoutsReceiver awnd limitation
Source rate
When p is small and Wr is large, reduces to
Refinement (Padhye, Firoin, Towsley & Kurose 1998)
pD
ax
s
s
)321(
83
3,1min3
2
1 ,min
2ppbp
Tbp
DD
Wx
oss
rs
Further Refinements
Further refinements of previous formulaPadhye, Firoiu, Towsley and Kurose
(1999)Other loss models
E.Altman, K.Avrachenkov and C.Barakat (Sigcomm 2000)
Square root p still appears!Dynamic models of TCP
e.g. RTT evolves as window increases
Link layer protocols
Interference suppressionReduces link error ratePower control, spreading gain control
Forward error correction (FEC)Improves link reliability
Link layer retransmissionHides loss from transport layerSource may timeout while BS
retransmits
Split TCP
Each TCP connection is split into twoBetween source and BSBetween BS and mobile
DisadvantagesTCP not suitable for lossy linkOverhead: packets TCP-processed twice at BS (vs.
0)Violates end-to-end semanticsPer-flow information at BS complicates handover
TCP
TCP
Snoop protocol
Snoop agent Monitors packets in both directionsDetects loss by dupACKs or local timeoutRetransmits lost packetSuppresses dupACKs
DisadvantagesCannot shield all wireless lossesOne agent per TCP connectionSource may timeout while BS retransmits
TCPsnooper
Explicit Loss Notification
Noncongestion losses are marked in ACKs
Source retransmits but do not reduce window
Effective in improving throughputDisadvantages
Overhead (TCP option)May not be able to distinguish types of
losses, e.g., corrupted headers
REM (Athuraliya & Low 2000)
Congestion measure: pricepl(t+1) = [pl(t) + (l bl(t)+ xl
(t) - cl )]+
Embedding: exponential probability function
Feedback: dropping or ECN marking
0 2 4 6 8 10 12 14 16 18 200
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Link congestion measure
Lin
k m
arkin
g probability
Performance Comparison
RED
B
1
High utilizationB
1
Low delay/loss
OR
REMmatch rate
High utilization
clear buffer
Low delay/loss
AND
Comparison with RED
Goodput
Loss
Queue
REM RED
Comparison with RED
Goodput
Loss
Queue
REM RED
Application: Wireless TCP
Reno uses loss as congestion measure
In wireless, significant losses due toFadingInterferenceHandoverNot buffer overflow (congestion)
Halving window too drasticSmall throughput, low utilization
Proposed solutions
IdeasHide from source noncongestion lossesInform source of noncongestion losses
ApproachesLink layer error controlSplit TCPSnoop agentSACK+ELN (Explicit Loss Notification)
Third approach
ProblemReno uses loss as congestion measureTwo types of losses
Congestion loss: retransmit + reduce windowNoncongestion loss: retransmit
Previous approachesHide noncongestion lossesIndicate noncongestion losses
Our approachEliminates congestion losses (buffer
overflows)
Third approach
IdeaREM clears bufferOnly noncongestion losses Retransmits lost packets without reducing window
RouterREM capable
HostDo not use loss as congestion measure
Vegas
REM
Performance
Goodput
Performance
Goodput
Food for thought
How to tailor utility to application?Choosing congestion control
automatically fixes utility functionCan use utility function to determine
congestion controlIncremental deployment strategy?
What if some, but not all, routers are ECN-capable
AcronymsACK AcknowledgementAQM Active Queue ManagementARP Address Resolution ProtocolARQ Automatic Repeat reQuestATM Asynchronous Transfer ModeBSD Berkeley Software DistributionB Byte (or octet) = 8 bitsbps bits per secondCA Congestion AvoidanceECN Explicit Congestion NotificationFIFO First In First OutFTP File Transfer ProtocolHTTP Hyper Text Transfer ProtocolIAB Internet Architecture BoardICMP Internet Control Message ProtocolIETF Internet Engineering Task ForceIP Internet ProtocolISOC Internet SocietyMSS Maximum Segment SizeMTU Maximum Transmission UnitPOS Packet Over SONET
QoS Quality of ServiceRED Random Early Detection/DiscardRFC Request for CommentRTT Round Trip TimeRTO Retransmission TimeOutSACK Selective ACKnowledgementSONET Synchronous Optical NETworkSS Slow StartSYN Synchronization PacketTCP Transmission Control ProtocolUDP User Datagram ProtocolVQ Virtual QueueWWW World Wide Web