Congestion in networks
description
Transcript of Congestion in networks
Congestion in networks
Transport Layer
3-2
TCP reliable data transferTCP creates rdt service on
top of IPrsquos unreliable service pipelined segments cumulative acks single retransmission timer
retransmissions triggered by timeout events duplicate acks
letrsquos initially consider simplified TCP sender ignore duplicate acks ignore flow control
congestion control
Transport Layer
3-3
TCP sender eventsdata rcvd from app
create segment with seq
seq is byte-stream number of first data byte in segment
start timer if not already running think of timer as for oldest
unacked segment expiration interval TimeOutInterval
timeout
retransmit segment that caused timeout
restart timer ack rcvd
if ack acknowledges previously unacked segments update what is known to
be ACKed start timer if there are still
unacked segments
Transport Layer
3-4TCP sender (simplified)
waitfor event
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
L
create segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running) start timer
data received from application above
retransmit not-yet-acked segment with smallest seq start timer
timeout
if (y gt SendBase) SendBase = y SendBasendash1 last cumulatively ACKed byte if (there are currently not-yet-acked segments) start timer else stop timer
ACK received with ACK field value y
Transport Layer
3-5TCP retransmission scenarios
lost ACK scenario
Host BHost A
Seq=92 8 bytes of data
ACK=100
Seq=92 8 bytes of data
Xtimeo
ut
ACK=100
premature timeout
Host BHost A
Seq=92 8 bytes of data
ACK=100
Seq=92 8bytes of data
timeo
utACK=120
Seq=100 20 bytes of data
ACK=120
SendBase=100SendBase=120
SendBase=120
SendBase=92
Transport Layer
3-6TCP retransmission scenarios
X
cumulative ACK
Host BHost A
Seq=92 8 bytes of data
ACK=100
Seq=120 15 bytes of data
timeo
ut
Seq=100 20 bytes of data
ACK=120
Transport Layer
3-7TCP ACK generation [RFC 1122 RFC 2581]
event at receiver
arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
arrival of in-order segment withexpected seq One other segment has ACK pending
arrival of out-of-order segmenthigher-than-expect seq Gap detected
arrival of segment that partially or completely fills gap
TCP receiver action
delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
immediately send single cumulative ACK ACKing both in-order segments
immediately send duplicate ACK indicating seq of next expected byte
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer
3-8TCP fast retransmittime-out period often
relatively long long delay before
resending lost packet
detect lost segments via duplicate ACKs sender often sends many
segments back-to-back if segment is lost there will
likely be many duplicate ACKs
if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unacked segment with smallest seq likely that unacked
segment lost so donrsquot wait for timeout
TCP fast retransmit
(ldquotriple duplicate ACKsrdquo)
Transport Layer
3-9
X
fast retransmit after sender receipt of triple duplicate ACK
Host BHost A
Seq=92 8 bytes of data
ACK=100
timeo
ut
ACK=100
ACK=100
ACK=100
TCP fast retransmit
Seq=100 20 bytes of data
Seq=100 20 bytes of data
Transport Layer
3-10
TCP flow controlapplicationprocess
TCP socketreceiver buffers
TCPcode
IPcode
applicationOS
receiver protocol stack
application may remove data from
TCP socket buffers hellip
hellip slower than TCP receiver is delivering(sender is sending)
from sender
receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast
flow control
Transport Layer
3-11
TCP flow control
buffered data
free buffer spacerwnd
RcvBuffer
TCP segment payloads
to application processreceiver ldquoadvertisesrdquo free
buffer space by including rwnd value in TCP header of receiver-to-sender segmentsRcvBuffer size set via socket
options (typical default is 4096 bytes)
many operating systems autoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value
guarantees receive buffer will not overflow
receiver-side buffering
Transport Layer
3-12Connection Managementbefore exchanging data senderreceiver
ldquohandshakerdquoagree to establish connection (each knowing the
other willing to establish connection)agree on connection parameters
connection state ESTABconnection variables
seq client-to-server server-to-clientrcvBuffer size at serverclient
application
network
connection state ESTABconnection Variables
seq client-to-server server-to-clientrcvBuffer size at serverclient
application
network
Socket clientSocket = newSocket(hostnameport
number)
Socket connectionSocket = welcomeSocketaccept()
Transport Layer
3-13
Q will 2-way handshake always work in network
variable delaysretransmitted messages (eg
req_conn(x)) due to message lossmessage reorderingcanrsquot ldquoseerdquo other side
2-way handshake
Letrsquos talkOK ESTAB
ESTAB
choose x req_conn(x)ESTAB
ESTABacc_conn(x)
Agreeing to establish a connection
Transport Layer
3-14
Agreeing to establish a connection
2-way handshake failure scenarios
retransmitreq_conn(
x)
ESTAB
req_conn(x)
half open connection(no client)
client terminat
esserverforgets x
connection x completes
retransmitreq_conn(
x)
ESTAB
req_conn(x)
data(x+1)
retransmitdata(x+1)
acceptdata(x+1)
choose x req_conn(x)ESTAB
ESTAB
acc_conn(x)
client terminat
es
ESTAB
choose x req_conn(x)ESTAB
acc_conn(x)
data(x+1) acceptdata(x+1)
connection x completes server
forgets x
Transport Layer
3-15TCP 3-way handshake
SYNbit=1 Seq=x
choose init seq num xsend TCP SYN msg
ESTAB
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
choose init seq num ysend TCP SYNACKmsg acking SYN
ACKbit=1 ACKnum=y+1
received SYNACK(x) indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data received ACK(y)
indicates client is live
SYNSENT
ESTAB
SYN RCVD
client state
LISTENserver state
LISTEN
Transport Layer
3-16
TCP 3-way handshake FSM
closed
L
listen
SYNrcvd
SYNsent
ESTAB
Socket clientSocket = newSocket(hostnameport
number)
SYN(seq=x)
Socket connectionSocket = welcomeSocketaccept()
SYN(x)SYNACK(seq=yACKnum=x+1)create new socket for communication back to client
SYNACK(seq=yACKnum=x+1)ACK(ACKnum=y+1)ACK(ACKnum=y+1)
L
Transport Layer
3-17TCP closing a connection
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer
3-18
FIN_WAIT_2
CLOSE_WAIT
FINbit=1 seq=y
ACKbit=1 ACKnum=y+1
ACKbit=1 ACKnum=x+1 wait for server
closecan stillsend data
can no longersend data
LAST_ACK
CLOSED
TIMED_WAIT
timed wait for 2max
segment lifetime
CLOSED
TCP closing a connection
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data
clientSocketclose()
client state server state
ESTABESTAB
Transport Layer
3-19Principles of congestion control
Transport Layer
3-20
congestion informally ldquotoo many sources sending too much data too
fast for network to handlerdquodifferent from flow controlmanifestationslost packets (buffer overflow at routers)long delays (queueing in router
buffers)a top-10 problem
Principles of congestion control
Transport Layer
3-21
Causescosts of congestion scenario 1 two senders two receiversone router infinite buffers output link capacity Rno retransmission
maximum per-connection throughput R2
unlimited shared output link buffers
Host A
original data lin
Host B
throughput lout
R2
R2
l out
lin R2de
lay
lin large delays as arrival
rate lin approaches capacity
Transport Layer
3-22
one router finite buffers sender retransmission of timed-out packet application-layer input = application-layer output lin = lout
transport-layer input includes retransmissions lin lin
finite shared output link buffers
Host A
lin original data
Host B
loutlin original data plus retransmitted data
Causescosts of congestion scenario 2
Transport Layer
3-23
idealization perfect knowledge
sender sends only when router buffers available
finite shared output link buffers
lin original dataloutlin original data plus
retransmitted data
copy
free buffer space
R2
R2
l out
lin
Causescosts of congestion scenario 2
Host B
A
Transport Layer
3-24
lin original dataloutlin original data plus
retransmitted data
copy
no buffer space
Idealization known loss packets can be lost dropped at router due to full buffers
sender only resends if packet known to be lost
Causescosts of congestion scenario 2
A
Host B
Transport Layer
3-25
lin original dataloutlin original data plus
retransmitted data
free buffer space
Causescosts of congestion scenario 2
Idealization known loss packets can be lost dropped at router due to full buffers
sender only resends if packet known to be lost
R2
R2lin
l out
when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)
A
Host B
Transport Layer
3-26
A
lin loutlincopy
free buffer space
timeout
R2
R2lin
l out
when sending at R2 some packets are retransmissions including duplicated that are delivered
Host B
Realistic duplicates packets can be lost
dropped at router due to full buffers
sender times out prematurely sending two copies both of which are delivered
Causescosts of congestion scenario 2
Transport Layer
3-27
R2
l out
when sending at R2 some packets are retransmissions including duplicated that are delivered
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple
copies of pkt decreasing goodput
R2lin
Causescosts of congestion scenario 2 Realistic duplicates packets can be lost
dropped at router due to full buffers
sender times out prematurely sending two copies both of which are delivered
Transport Layer
3-28
four sendersmultihop paths timeoutretransmit
Q what happens as lin and lin
rsquo increase
finite shared output link buffers
Host A lout
Causescosts of congestion scenario 3
Host B
Host CHost D
lin original data
lin original data plus retransmitted data
A as red linrsquo increases all
arriving blue pkts at upper queue are dropped blue throughput g 0
Transport Layer
3-29
another ldquocostrdquo of congestion when packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Causescosts of congestion scenario 3
C2
C2
l out
linrsquo
Transport Layer
3-30
Approaches towards congestion controltwo broad approaches towards congestion
controlend-end congestion
controlno explicit
feedback from network
congestion inferred from end-system observed loss delay
approach taken by TCP
network-assisted congestion control
routers provide feedback to end systemssingle bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
explicit rate for sender to send at
Transport Layer
3-31Case study ATM ABR congestion
controlABR available bit rateldquoelastic servicerdquo if senderrsquos path
ldquounderloadedrdquo sender should use
available bandwidth
if senderrsquos path congested sender throttled to
minimum guaranteed rate
RM (resource management) cellssent by sender interspersed
with data cellsbits in RM cell set by switches
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild
congestion) CI bit congestion indication
RM cells returned to sender by receiver with bits intact
Transport Layer
3-32Case study ATM ABR congestion control
two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path
EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in
returned RM cell
RM cell data cell
Transport Layer
3-33
TCP congestion control
Transport Layer
3-34
TCP congestion control additive increase multiplicative decrease approach sender increases transmission
rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1
MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half
after loss
cwnd
TCP
sen
der
cong
estio
n w
indo
w s
ize
AIMD saw toothbehavior probing
for bandwidth
additively increase window size helliphellip until loss occurs (then cut window in half)
time
Transport Layer
3-35TCP Congestion Control details
sender limits transmission
cwnd is dynamic function of perceived network congestion
TCP sending rate
roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte
ACKed sent not-yet ACKed(ldquoin-flightrdquo)
last byte sent
cwnd
LastByteSent-LastByteAcked
lt cwnd
sender sequence number space
rate~~cwndRTT bytessec
Transport Layer
3-36TCP Slow Start
when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for
every ACK received
summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
Transport Layer
3-37
TCP detecting reacting to loss
loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then
grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly
TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)
Transport Layer
3-38
Q when should the exponential increase switch to linear
A when cwnd gets to 12 of its value before timeout
Implementationvariable ssthresh on loss event ssthresh is
set to 12 of cwnd just before loss event
TCP switching from slow start to CA
Transport Layer
3-39
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer
3-40TCP throughput
avg TCP thruput as function of window size RTTignore slow start assume always data to
sendW window size (measured in bytes) where loss
occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW
W2
avg TCP thruput = 34WRTTbytessec
Transport Layer
3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want
10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L
[Mathis 1997]
to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate
new versions of TCP for high-speed
TCP throughput = 122 MSSRTT L
Transport Layer
3-42
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP Fairness
TCP connection 2
Transport Layer
3-43
Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout
increasesmultiplicative decrease decreases throughput
proportionally R
R
equal bandwidth share
Connection 1 throughputCon
nec t
ion
2 t h
roug
hput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer
3-44
Fairness (more)Fairness and UDP
multimedia apps often do not use TCP do not want rate throttled
by congestion control
instead use UDP send audiovideo at
constant rate tolerate packet loss
Fairness parallel TCP connections
application can open multiple parallel connections between two hosts
web browsers do this eg link of rate R with 9
existing connectionsnew app asks for 1 TCP gets
rate R10new app asks for 11 TCPs
gets R2
- Congestion in networks
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (2)
- TCP ACK generation [RFC 1122 RFC 2581]
- TCP fast retransmit
- TCP fast retransmit (2)
- TCP flow control
- TCP flow control (2)
- Connection Management
- Agreeing to establish a connection
- Agreeing to establish a connection (2)
- TCP 3-way handshake
- TCP 3-way handshake FSM
- TCP closing a connection
- TCP closing a connection (2)
- Principles of congestion control
- Principles of congestion control (2)
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 2 (3)
- Causescosts of congestion scenario 2 (4)
- Causescosts of congestion scenario 2 (5)
- Causescosts of congestion scenario 2 (6)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- Case study ATM ABR congestion control
- Case study ATM ABR congestion control (2)
- TCP congestion control
- TCP congestion control additive increase multiplicative decre
- TCP Congestion Control details
- TCP Slow Start
- TCP detecting reacting to loss
- TCP switching from slow start to CA
- Summary TCP Congestion Control
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- TCP Fairness
- Why is TCP fair
- Fairness (more)
-
Transport Layer
3-2
TCP reliable data transferTCP creates rdt service on
top of IPrsquos unreliable service pipelined segments cumulative acks single retransmission timer
retransmissions triggered by timeout events duplicate acks
letrsquos initially consider simplified TCP sender ignore duplicate acks ignore flow control
congestion control
Transport Layer
3-3
TCP sender eventsdata rcvd from app
create segment with seq
seq is byte-stream number of first data byte in segment
start timer if not already running think of timer as for oldest
unacked segment expiration interval TimeOutInterval
timeout
retransmit segment that caused timeout
restart timer ack rcvd
if ack acknowledges previously unacked segments update what is known to
be ACKed start timer if there are still
unacked segments
Transport Layer
3-4TCP sender (simplified)
waitfor event
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
L
create segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running) start timer
data received from application above
retransmit not-yet-acked segment with smallest seq start timer
timeout
if (y gt SendBase) SendBase = y SendBasendash1 last cumulatively ACKed byte if (there are currently not-yet-acked segments) start timer else stop timer
ACK received with ACK field value y
Transport Layer
3-5TCP retransmission scenarios
lost ACK scenario
Host BHost A
Seq=92 8 bytes of data
ACK=100
Seq=92 8 bytes of data
Xtimeo
ut
ACK=100
premature timeout
Host BHost A
Seq=92 8 bytes of data
ACK=100
Seq=92 8bytes of data
timeo
utACK=120
Seq=100 20 bytes of data
ACK=120
SendBase=100SendBase=120
SendBase=120
SendBase=92
Transport Layer
3-6TCP retransmission scenarios
X
cumulative ACK
Host BHost A
Seq=92 8 bytes of data
ACK=100
Seq=120 15 bytes of data
timeo
ut
Seq=100 20 bytes of data
ACK=120
Transport Layer
3-7TCP ACK generation [RFC 1122 RFC 2581]
event at receiver
arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
arrival of in-order segment withexpected seq One other segment has ACK pending
arrival of out-of-order segmenthigher-than-expect seq Gap detected
arrival of segment that partially or completely fills gap
TCP receiver action
delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
immediately send single cumulative ACK ACKing both in-order segments
immediately send duplicate ACK indicating seq of next expected byte
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer
3-8TCP fast retransmittime-out period often
relatively long long delay before
resending lost packet
detect lost segments via duplicate ACKs sender often sends many
segments back-to-back if segment is lost there will
likely be many duplicate ACKs
if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unacked segment with smallest seq likely that unacked
segment lost so donrsquot wait for timeout
TCP fast retransmit
(ldquotriple duplicate ACKsrdquo)
Transport Layer
3-9
X
fast retransmit after sender receipt of triple duplicate ACK
Host BHost A
Seq=92 8 bytes of data
ACK=100
timeo
ut
ACK=100
ACK=100
ACK=100
TCP fast retransmit
Seq=100 20 bytes of data
Seq=100 20 bytes of data
Transport Layer
3-10
TCP flow controlapplicationprocess
TCP socketreceiver buffers
TCPcode
IPcode
applicationOS
receiver protocol stack
application may remove data from
TCP socket buffers hellip
hellip slower than TCP receiver is delivering(sender is sending)
from sender
receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast
flow control
Transport Layer
3-11
TCP flow control
buffered data
free buffer spacerwnd
RcvBuffer
TCP segment payloads
to application processreceiver ldquoadvertisesrdquo free
buffer space by including rwnd value in TCP header of receiver-to-sender segmentsRcvBuffer size set via socket
options (typical default is 4096 bytes)
many operating systems autoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value
guarantees receive buffer will not overflow
receiver-side buffering
Transport Layer
3-12Connection Managementbefore exchanging data senderreceiver
ldquohandshakerdquoagree to establish connection (each knowing the
other willing to establish connection)agree on connection parameters
connection state ESTABconnection variables
seq client-to-server server-to-clientrcvBuffer size at serverclient
application
network
connection state ESTABconnection Variables
seq client-to-server server-to-clientrcvBuffer size at serverclient
application
network
Socket clientSocket = newSocket(hostnameport
number)
Socket connectionSocket = welcomeSocketaccept()
Transport Layer
3-13
Q will 2-way handshake always work in network
variable delaysretransmitted messages (eg
req_conn(x)) due to message lossmessage reorderingcanrsquot ldquoseerdquo other side
2-way handshake
Letrsquos talkOK ESTAB
ESTAB
choose x req_conn(x)ESTAB
ESTABacc_conn(x)
Agreeing to establish a connection
Transport Layer
3-14
Agreeing to establish a connection
2-way handshake failure scenarios
retransmitreq_conn(
x)
ESTAB
req_conn(x)
half open connection(no client)
client terminat
esserverforgets x
connection x completes
retransmitreq_conn(
x)
ESTAB
req_conn(x)
data(x+1)
retransmitdata(x+1)
acceptdata(x+1)
choose x req_conn(x)ESTAB
ESTAB
acc_conn(x)
client terminat
es
ESTAB
choose x req_conn(x)ESTAB
acc_conn(x)
data(x+1) acceptdata(x+1)
connection x completes server
forgets x
Transport Layer
3-15TCP 3-way handshake
SYNbit=1 Seq=x
choose init seq num xsend TCP SYN msg
ESTAB
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
choose init seq num ysend TCP SYNACKmsg acking SYN
ACKbit=1 ACKnum=y+1
received SYNACK(x) indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data received ACK(y)
indicates client is live
SYNSENT
ESTAB
SYN RCVD
client state
LISTENserver state
LISTEN
Transport Layer
3-16
TCP 3-way handshake FSM
closed
L
listen
SYNrcvd
SYNsent
ESTAB
Socket clientSocket = newSocket(hostnameport
number)
SYN(seq=x)
Socket connectionSocket = welcomeSocketaccept()
SYN(x)SYNACK(seq=yACKnum=x+1)create new socket for communication back to client
SYNACK(seq=yACKnum=x+1)ACK(ACKnum=y+1)ACK(ACKnum=y+1)
L
Transport Layer
3-17TCP closing a connection
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer
3-18
FIN_WAIT_2
CLOSE_WAIT
FINbit=1 seq=y
ACKbit=1 ACKnum=y+1
ACKbit=1 ACKnum=x+1 wait for server
closecan stillsend data
can no longersend data
LAST_ACK
CLOSED
TIMED_WAIT
timed wait for 2max
segment lifetime
CLOSED
TCP closing a connection
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data
clientSocketclose()
client state server state
ESTABESTAB
Transport Layer
3-19Principles of congestion control
Transport Layer
3-20
congestion informally ldquotoo many sources sending too much data too
fast for network to handlerdquodifferent from flow controlmanifestationslost packets (buffer overflow at routers)long delays (queueing in router
buffers)a top-10 problem
Principles of congestion control
Transport Layer
3-21
Causescosts of congestion scenario 1 two senders two receiversone router infinite buffers output link capacity Rno retransmission
maximum per-connection throughput R2
unlimited shared output link buffers
Host A
original data lin
Host B
throughput lout
R2
R2
l out
lin R2de
lay
lin large delays as arrival
rate lin approaches capacity
Transport Layer
3-22
one router finite buffers sender retransmission of timed-out packet application-layer input = application-layer output lin = lout
transport-layer input includes retransmissions lin lin
finite shared output link buffers
Host A
lin original data
Host B
loutlin original data plus retransmitted data
Causescosts of congestion scenario 2
Transport Layer
3-23
idealization perfect knowledge
sender sends only when router buffers available
finite shared output link buffers
lin original dataloutlin original data plus
retransmitted data
copy
free buffer space
R2
R2
l out
lin
Causescosts of congestion scenario 2
Host B
A
Transport Layer
3-24
lin original dataloutlin original data plus
retransmitted data
copy
no buffer space
Idealization known loss packets can be lost dropped at router due to full buffers
sender only resends if packet known to be lost
Causescosts of congestion scenario 2
A
Host B
Transport Layer
3-25
lin original dataloutlin original data plus
retransmitted data
free buffer space
Causescosts of congestion scenario 2
Idealization known loss packets can be lost dropped at router due to full buffers
sender only resends if packet known to be lost
R2
R2lin
l out
when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)
A
Host B
Transport Layer
3-26
A
lin loutlincopy
free buffer space
timeout
R2
R2lin
l out
when sending at R2 some packets are retransmissions including duplicated that are delivered
Host B
Realistic duplicates packets can be lost
dropped at router due to full buffers
sender times out prematurely sending two copies both of which are delivered
Causescosts of congestion scenario 2
Transport Layer
3-27
R2
l out
when sending at R2 some packets are retransmissions including duplicated that are delivered
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple
copies of pkt decreasing goodput
R2lin
Causescosts of congestion scenario 2 Realistic duplicates packets can be lost
dropped at router due to full buffers
sender times out prematurely sending two copies both of which are delivered
Transport Layer
3-28
four sendersmultihop paths timeoutretransmit
Q what happens as lin and lin
rsquo increase
finite shared output link buffers
Host A lout
Causescosts of congestion scenario 3
Host B
Host CHost D
lin original data
lin original data plus retransmitted data
A as red linrsquo increases all
arriving blue pkts at upper queue are dropped blue throughput g 0
Transport Layer
3-29
another ldquocostrdquo of congestion when packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Causescosts of congestion scenario 3
C2
C2
l out
linrsquo
Transport Layer
3-30
Approaches towards congestion controltwo broad approaches towards congestion
controlend-end congestion
controlno explicit
feedback from network
congestion inferred from end-system observed loss delay
approach taken by TCP
network-assisted congestion control
routers provide feedback to end systemssingle bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
explicit rate for sender to send at
Transport Layer
3-31Case study ATM ABR congestion
controlABR available bit rateldquoelastic servicerdquo if senderrsquos path
ldquounderloadedrdquo sender should use
available bandwidth
if senderrsquos path congested sender throttled to
minimum guaranteed rate
RM (resource management) cellssent by sender interspersed
with data cellsbits in RM cell set by switches
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild
congestion) CI bit congestion indication
RM cells returned to sender by receiver with bits intact
Transport Layer
3-32Case study ATM ABR congestion control
two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path
EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in
returned RM cell
RM cell data cell
Transport Layer
3-33
TCP congestion control
Transport Layer
3-34
TCP congestion control additive increase multiplicative decrease approach sender increases transmission
rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1
MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half
after loss
cwnd
TCP
sen
der
cong
estio
n w
indo
w s
ize
AIMD saw toothbehavior probing
for bandwidth
additively increase window size helliphellip until loss occurs (then cut window in half)
time
Transport Layer
3-35TCP Congestion Control details
sender limits transmission
cwnd is dynamic function of perceived network congestion
TCP sending rate
roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte
ACKed sent not-yet ACKed(ldquoin-flightrdquo)
last byte sent
cwnd
LastByteSent-LastByteAcked
lt cwnd
sender sequence number space
rate~~cwndRTT bytessec
Transport Layer
3-36TCP Slow Start
when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for
every ACK received
summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
Transport Layer
3-37
TCP detecting reacting to loss
loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then
grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly
TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)
Transport Layer
3-38
Q when should the exponential increase switch to linear
A when cwnd gets to 12 of its value before timeout
Implementationvariable ssthresh on loss event ssthresh is
set to 12 of cwnd just before loss event
TCP switching from slow start to CA
Transport Layer
3-39
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer
3-40TCP throughput
avg TCP thruput as function of window size RTTignore slow start assume always data to
sendW window size (measured in bytes) where loss
occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW
W2
avg TCP thruput = 34WRTTbytessec
Transport Layer
3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want
10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L
[Mathis 1997]
to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate
new versions of TCP for high-speed
TCP throughput = 122 MSSRTT L
Transport Layer
3-42
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP Fairness
TCP connection 2
Transport Layer
3-43
Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout
increasesmultiplicative decrease decreases throughput
proportionally R
R
equal bandwidth share
Connection 1 throughputCon
nec t
ion
2 t h
roug
hput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer
3-44
Fairness (more)Fairness and UDP
multimedia apps often do not use TCP do not want rate throttled
by congestion control
instead use UDP send audiovideo at
constant rate tolerate packet loss
Fairness parallel TCP connections
application can open multiple parallel connections between two hosts
web browsers do this eg link of rate R with 9
existing connectionsnew app asks for 1 TCP gets
rate R10new app asks for 11 TCPs
gets R2
- Congestion in networks
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (2)
- TCP ACK generation [RFC 1122 RFC 2581]
- TCP fast retransmit
- TCP fast retransmit (2)
- TCP flow control
- TCP flow control (2)
- Connection Management
- Agreeing to establish a connection
- Agreeing to establish a connection (2)
- TCP 3-way handshake
- TCP 3-way handshake FSM
- TCP closing a connection
- TCP closing a connection (2)
- Principles of congestion control
- Principles of congestion control (2)
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 2 (3)
- Causescosts of congestion scenario 2 (4)
- Causescosts of congestion scenario 2 (5)
- Causescosts of congestion scenario 2 (6)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- Case study ATM ABR congestion control
- Case study ATM ABR congestion control (2)
- TCP congestion control
- TCP congestion control additive increase multiplicative decre
- TCP Congestion Control details
- TCP Slow Start
- TCP detecting reacting to loss
- TCP switching from slow start to CA
- Summary TCP Congestion Control
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- TCP Fairness
- Why is TCP fair
- Fairness (more)
-
Transport Layer
3-3
TCP sender eventsdata rcvd from app
create segment with seq
seq is byte-stream number of first data byte in segment
start timer if not already running think of timer as for oldest
unacked segment expiration interval TimeOutInterval
timeout
retransmit segment that caused timeout
restart timer ack rcvd
if ack acknowledges previously unacked segments update what is known to
be ACKed start timer if there are still
unacked segments
Transport Layer
3-4TCP sender (simplified)
waitfor event
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
L
create segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running) start timer
data received from application above
retransmit not-yet-acked segment with smallest seq start timer
timeout
if (y gt SendBase) SendBase = y SendBasendash1 last cumulatively ACKed byte if (there are currently not-yet-acked segments) start timer else stop timer
ACK received with ACK field value y
Transport Layer
3-5TCP retransmission scenarios
lost ACK scenario
Host BHost A
Seq=92 8 bytes of data
ACK=100
Seq=92 8 bytes of data
Xtimeo
ut
ACK=100
premature timeout
Host BHost A
Seq=92 8 bytes of data
ACK=100
Seq=92 8bytes of data
timeo
utACK=120
Seq=100 20 bytes of data
ACK=120
SendBase=100SendBase=120
SendBase=120
SendBase=92
Transport Layer
3-6TCP retransmission scenarios
X
cumulative ACK
Host BHost A
Seq=92 8 bytes of data
ACK=100
Seq=120 15 bytes of data
timeo
ut
Seq=100 20 bytes of data
ACK=120
Transport Layer
3-7TCP ACK generation [RFC 1122 RFC 2581]
event at receiver
arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
arrival of in-order segment withexpected seq One other segment has ACK pending
arrival of out-of-order segmenthigher-than-expect seq Gap detected
arrival of segment that partially or completely fills gap
TCP receiver action
delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
immediately send single cumulative ACK ACKing both in-order segments
immediately send duplicate ACK indicating seq of next expected byte
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer
3-8TCP fast retransmittime-out period often
relatively long long delay before
resending lost packet
detect lost segments via duplicate ACKs sender often sends many
segments back-to-back if segment is lost there will
likely be many duplicate ACKs
if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unacked segment with smallest seq likely that unacked
segment lost so donrsquot wait for timeout
TCP fast retransmit
(ldquotriple duplicate ACKsrdquo)
Transport Layer
3-9
X
fast retransmit after sender receipt of triple duplicate ACK
Host BHost A
Seq=92 8 bytes of data
ACK=100
timeo
ut
ACK=100
ACK=100
ACK=100
TCP fast retransmit
Seq=100 20 bytes of data
Seq=100 20 bytes of data
Transport Layer
3-10
TCP flow controlapplicationprocess
TCP socketreceiver buffers
TCPcode
IPcode
applicationOS
receiver protocol stack
application may remove data from
TCP socket buffers hellip
hellip slower than TCP receiver is delivering(sender is sending)
from sender
receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast
flow control
Transport Layer
3-11
TCP flow control
buffered data
free buffer spacerwnd
RcvBuffer
TCP segment payloads
to application processreceiver ldquoadvertisesrdquo free
buffer space by including rwnd value in TCP header of receiver-to-sender segmentsRcvBuffer size set via socket
options (typical default is 4096 bytes)
many operating systems autoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value
guarantees receive buffer will not overflow
receiver-side buffering
Transport Layer
3-12Connection Managementbefore exchanging data senderreceiver
ldquohandshakerdquoagree to establish connection (each knowing the
other willing to establish connection)agree on connection parameters
connection state ESTABconnection variables
seq client-to-server server-to-clientrcvBuffer size at serverclient
application
network
connection state ESTABconnection Variables
seq client-to-server server-to-clientrcvBuffer size at serverclient
application
network
Socket clientSocket = newSocket(hostnameport
number)
Socket connectionSocket = welcomeSocketaccept()
Transport Layer
3-13
Q will 2-way handshake always work in network
variable delaysretransmitted messages (eg
req_conn(x)) due to message lossmessage reorderingcanrsquot ldquoseerdquo other side
2-way handshake
Letrsquos talkOK ESTAB
ESTAB
choose x req_conn(x)ESTAB
ESTABacc_conn(x)
Agreeing to establish a connection
Transport Layer
3-14
Agreeing to establish a connection
2-way handshake failure scenarios
retransmitreq_conn(
x)
ESTAB
req_conn(x)
half open connection(no client)
client terminat
esserverforgets x
connection x completes
retransmitreq_conn(
x)
ESTAB
req_conn(x)
data(x+1)
retransmitdata(x+1)
acceptdata(x+1)
choose x req_conn(x)ESTAB
ESTAB
acc_conn(x)
client terminat
es
ESTAB
choose x req_conn(x)ESTAB
acc_conn(x)
data(x+1) acceptdata(x+1)
connection x completes server
forgets x
Transport Layer
3-15TCP 3-way handshake
SYNbit=1 Seq=x
choose init seq num xsend TCP SYN msg
ESTAB
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
choose init seq num ysend TCP SYNACKmsg acking SYN
ACKbit=1 ACKnum=y+1
received SYNACK(x) indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data received ACK(y)
indicates client is live
SYNSENT
ESTAB
SYN RCVD
client state
LISTENserver state
LISTEN
Transport Layer
3-16
TCP 3-way handshake FSM
closed
L
listen
SYNrcvd
SYNsent
ESTAB
Socket clientSocket = newSocket(hostnameport
number)
SYN(seq=x)
Socket connectionSocket = welcomeSocketaccept()
SYN(x)SYNACK(seq=yACKnum=x+1)create new socket for communication back to client
SYNACK(seq=yACKnum=x+1)ACK(ACKnum=y+1)ACK(ACKnum=y+1)
L
Transport Layer
3-17TCP closing a connection
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer
3-18
FIN_WAIT_2
CLOSE_WAIT
FINbit=1 seq=y
ACKbit=1 ACKnum=y+1
ACKbit=1 ACKnum=x+1 wait for server
closecan stillsend data
can no longersend data
LAST_ACK
CLOSED
TIMED_WAIT
timed wait for 2max
segment lifetime
CLOSED
TCP closing a connection
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data
clientSocketclose()
client state server state
ESTABESTAB
Transport Layer
3-19Principles of congestion control
Transport Layer
3-20
congestion informally ldquotoo many sources sending too much data too
fast for network to handlerdquodifferent from flow controlmanifestationslost packets (buffer overflow at routers)long delays (queueing in router
buffers)a top-10 problem
Principles of congestion control
Transport Layer
3-21
Causescosts of congestion scenario 1 two senders two receiversone router infinite buffers output link capacity Rno retransmission
maximum per-connection throughput R2
unlimited shared output link buffers
Host A
original data lin
Host B
throughput lout
R2
R2
l out
lin R2de
lay
lin large delays as arrival
rate lin approaches capacity
Transport Layer
3-22
one router finite buffers sender retransmission of timed-out packet application-layer input = application-layer output lin = lout
transport-layer input includes retransmissions lin lin
finite shared output link buffers
Host A
lin original data
Host B
loutlin original data plus retransmitted data
Causescosts of congestion scenario 2
Transport Layer
3-23
idealization perfect knowledge
sender sends only when router buffers available
finite shared output link buffers
lin original dataloutlin original data plus
retransmitted data
copy
free buffer space
R2
R2
l out
lin
Causescosts of congestion scenario 2
Host B
A
Transport Layer
3-24
lin original dataloutlin original data plus
retransmitted data
copy
no buffer space
Idealization known loss packets can be lost dropped at router due to full buffers
sender only resends if packet known to be lost
Causescosts of congestion scenario 2
A
Host B
Transport Layer
3-25
lin original dataloutlin original data plus
retransmitted data
free buffer space
Causescosts of congestion scenario 2
Idealization known loss packets can be lost dropped at router due to full buffers
sender only resends if packet known to be lost
R2
R2lin
l out
when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)
A
Host B
Transport Layer
3-26
A
lin loutlincopy
free buffer space
timeout
R2
R2lin
l out
when sending at R2 some packets are retransmissions including duplicated that are delivered
Host B
Realistic duplicates packets can be lost
dropped at router due to full buffers
sender times out prematurely sending two copies both of which are delivered
Causescosts of congestion scenario 2
Transport Layer
3-27
R2
l out
when sending at R2 some packets are retransmissions including duplicated that are delivered
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple
copies of pkt decreasing goodput
R2lin
Causescosts of congestion scenario 2 Realistic duplicates packets can be lost
dropped at router due to full buffers
sender times out prematurely sending two copies both of which are delivered
Transport Layer
3-28
four sendersmultihop paths timeoutretransmit
Q what happens as lin and lin
rsquo increase
finite shared output link buffers
Host A lout
Causescosts of congestion scenario 3
Host B
Host CHost D
lin original data
lin original data plus retransmitted data
A as red linrsquo increases all
arriving blue pkts at upper queue are dropped blue throughput g 0
Transport Layer
3-29
another ldquocostrdquo of congestion when packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Causescosts of congestion scenario 3
C2
C2
l out
linrsquo
Transport Layer
3-30
Approaches towards congestion controltwo broad approaches towards congestion
controlend-end congestion
controlno explicit
feedback from network
congestion inferred from end-system observed loss delay
approach taken by TCP
network-assisted congestion control
routers provide feedback to end systemssingle bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
explicit rate for sender to send at
Transport Layer
3-31Case study ATM ABR congestion
controlABR available bit rateldquoelastic servicerdquo if senderrsquos path
ldquounderloadedrdquo sender should use
available bandwidth
if senderrsquos path congested sender throttled to
minimum guaranteed rate
RM (resource management) cellssent by sender interspersed
with data cellsbits in RM cell set by switches
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild
congestion) CI bit congestion indication
RM cells returned to sender by receiver with bits intact
Transport Layer
3-32Case study ATM ABR congestion control
two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path
EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in
returned RM cell
RM cell data cell
Transport Layer
3-33
TCP congestion control
Transport Layer
3-34
TCP congestion control additive increase multiplicative decrease approach sender increases transmission
rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1
MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half
after loss
cwnd
TCP
sen
der
cong
estio
n w
indo
w s
ize
AIMD saw toothbehavior probing
for bandwidth
additively increase window size helliphellip until loss occurs (then cut window in half)
time
Transport Layer
3-35TCP Congestion Control details
sender limits transmission
cwnd is dynamic function of perceived network congestion
TCP sending rate
roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte
ACKed sent not-yet ACKed(ldquoin-flightrdquo)
last byte sent
cwnd
LastByteSent-LastByteAcked
lt cwnd
sender sequence number space
rate~~cwndRTT bytessec
Transport Layer
3-36TCP Slow Start
when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for
every ACK received
summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
Transport Layer
3-37
TCP detecting reacting to loss
loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then
grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly
TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)
Transport Layer
3-38
Q when should the exponential increase switch to linear
A when cwnd gets to 12 of its value before timeout
Implementationvariable ssthresh on loss event ssthresh is
set to 12 of cwnd just before loss event
TCP switching from slow start to CA
Transport Layer
3-39
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer
3-40TCP throughput
avg TCP thruput as function of window size RTTignore slow start assume always data to
sendW window size (measured in bytes) where loss
occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW
W2
avg TCP thruput = 34WRTTbytessec
Transport Layer
3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want
10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L
[Mathis 1997]
to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate
new versions of TCP for high-speed
TCP throughput = 122 MSSRTT L
Transport Layer
3-42
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP Fairness
TCP connection 2
Transport Layer
3-43
Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout
increasesmultiplicative decrease decreases throughput
proportionally R
R
equal bandwidth share
Connection 1 throughputCon
nec t
ion
2 t h
roug
hput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer
3-44
Fairness (more)Fairness and UDP
multimedia apps often do not use TCP do not want rate throttled
by congestion control
instead use UDP send audiovideo at
constant rate tolerate packet loss
Fairness parallel TCP connections
application can open multiple parallel connections between two hosts
web browsers do this eg link of rate R with 9
existing connectionsnew app asks for 1 TCP gets
rate R10new app asks for 11 TCPs
gets R2
- Congestion in networks
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (2)
- TCP ACK generation [RFC 1122 RFC 2581]
- TCP fast retransmit
- TCP fast retransmit (2)
- TCP flow control
- TCP flow control (2)
- Connection Management
- Agreeing to establish a connection
- Agreeing to establish a connection (2)
- TCP 3-way handshake
- TCP 3-way handshake FSM
- TCP closing a connection
- TCP closing a connection (2)
- Principles of congestion control
- Principles of congestion control (2)
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 2 (3)
- Causescosts of congestion scenario 2 (4)
- Causescosts of congestion scenario 2 (5)
- Causescosts of congestion scenario 2 (6)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- Case study ATM ABR congestion control
- Case study ATM ABR congestion control (2)
- TCP congestion control
- TCP congestion control additive increase multiplicative decre
- TCP Congestion Control details
- TCP Slow Start
- TCP detecting reacting to loss
- TCP switching from slow start to CA
- Summary TCP Congestion Control
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- TCP Fairness
- Why is TCP fair
- Fairness (more)
-
Transport Layer
3-4TCP sender (simplified)
waitfor event
NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
L
create segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running) start timer
data received from application above
retransmit not-yet-acked segment with smallest seq start timer
timeout
if (y gt SendBase) SendBase = y SendBasendash1 last cumulatively ACKed byte if (there are currently not-yet-acked segments) start timer else stop timer
ACK received with ACK field value y
Transport Layer
3-5TCP retransmission scenarios
lost ACK scenario
Host BHost A
Seq=92 8 bytes of data
ACK=100
Seq=92 8 bytes of data
Xtimeo
ut
ACK=100
premature timeout
Host BHost A
Seq=92 8 bytes of data
ACK=100
Seq=92 8bytes of data
timeo
utACK=120
Seq=100 20 bytes of data
ACK=120
SendBase=100SendBase=120
SendBase=120
SendBase=92
Transport Layer
3-6TCP retransmission scenarios
X
cumulative ACK
Host BHost A
Seq=92 8 bytes of data
ACK=100
Seq=120 15 bytes of data
timeo
ut
Seq=100 20 bytes of data
ACK=120
Transport Layer
3-7TCP ACK generation [RFC 1122 RFC 2581]
event at receiver
arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
arrival of in-order segment withexpected seq One other segment has ACK pending
arrival of out-of-order segmenthigher-than-expect seq Gap detected
arrival of segment that partially or completely fills gap
TCP receiver action
delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
immediately send single cumulative ACK ACKing both in-order segments
immediately send duplicate ACK indicating seq of next expected byte
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer
3-8TCP fast retransmittime-out period often
relatively long long delay before
resending lost packet
detect lost segments via duplicate ACKs sender often sends many
segments back-to-back if segment is lost there will
likely be many duplicate ACKs
if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unacked segment with smallest seq likely that unacked
segment lost so donrsquot wait for timeout
TCP fast retransmit
(ldquotriple duplicate ACKsrdquo)
Transport Layer
3-9
X
fast retransmit after sender receipt of triple duplicate ACK
Host BHost A
Seq=92 8 bytes of data
ACK=100
timeo
ut
ACK=100
ACK=100
ACK=100
TCP fast retransmit
Seq=100 20 bytes of data
Seq=100 20 bytes of data
Transport Layer
3-10
TCP flow controlapplicationprocess
TCP socketreceiver buffers
TCPcode
IPcode
applicationOS
receiver protocol stack
application may remove data from
TCP socket buffers hellip
hellip slower than TCP receiver is delivering(sender is sending)
from sender
receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast
flow control
Transport Layer
3-11
TCP flow control
buffered data
free buffer spacerwnd
RcvBuffer
TCP segment payloads
to application processreceiver ldquoadvertisesrdquo free
buffer space by including rwnd value in TCP header of receiver-to-sender segmentsRcvBuffer size set via socket
options (typical default is 4096 bytes)
many operating systems autoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value
guarantees receive buffer will not overflow
receiver-side buffering
Transport Layer
3-12Connection Managementbefore exchanging data senderreceiver
ldquohandshakerdquoagree to establish connection (each knowing the
other willing to establish connection)agree on connection parameters
connection state ESTABconnection variables
seq client-to-server server-to-clientrcvBuffer size at serverclient
application
network
connection state ESTABconnection Variables
seq client-to-server server-to-clientrcvBuffer size at serverclient
application
network
Socket clientSocket = newSocket(hostnameport
number)
Socket connectionSocket = welcomeSocketaccept()
Transport Layer
3-13
Q will 2-way handshake always work in network
variable delaysretransmitted messages (eg
req_conn(x)) due to message lossmessage reorderingcanrsquot ldquoseerdquo other side
2-way handshake
Letrsquos talkOK ESTAB
ESTAB
choose x req_conn(x)ESTAB
ESTABacc_conn(x)
Agreeing to establish a connection
Transport Layer
3-14
Agreeing to establish a connection
2-way handshake failure scenarios
retransmitreq_conn(
x)
ESTAB
req_conn(x)
half open connection(no client)
client terminat
esserverforgets x
connection x completes
retransmitreq_conn(
x)
ESTAB
req_conn(x)
data(x+1)
retransmitdata(x+1)
acceptdata(x+1)
choose x req_conn(x)ESTAB
ESTAB
acc_conn(x)
client terminat
es
ESTAB
choose x req_conn(x)ESTAB
acc_conn(x)
data(x+1) acceptdata(x+1)
connection x completes server
forgets x
Transport Layer
3-15TCP 3-way handshake
SYNbit=1 Seq=x
choose init seq num xsend TCP SYN msg
ESTAB
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
choose init seq num ysend TCP SYNACKmsg acking SYN
ACKbit=1 ACKnum=y+1
received SYNACK(x) indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data received ACK(y)
indicates client is live
SYNSENT
ESTAB
SYN RCVD
client state
LISTENserver state
LISTEN
Transport Layer
3-16
TCP 3-way handshake FSM
closed
L
listen
SYNrcvd
SYNsent
ESTAB
Socket clientSocket = newSocket(hostnameport
number)
SYN(seq=x)
Socket connectionSocket = welcomeSocketaccept()
SYN(x)SYNACK(seq=yACKnum=x+1)create new socket for communication back to client
SYNACK(seq=yACKnum=x+1)ACK(ACKnum=y+1)ACK(ACKnum=y+1)
L
Transport Layer
3-17TCP closing a connection
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer
3-18
FIN_WAIT_2
CLOSE_WAIT
FINbit=1 seq=y
ACKbit=1 ACKnum=y+1
ACKbit=1 ACKnum=x+1 wait for server
closecan stillsend data
can no longersend data
LAST_ACK
CLOSED
TIMED_WAIT
timed wait for 2max
segment lifetime
CLOSED
TCP closing a connection
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data
clientSocketclose()
client state server state
ESTABESTAB
Transport Layer
3-19Principles of congestion control
Transport Layer
3-20
congestion informally ldquotoo many sources sending too much data too
fast for network to handlerdquodifferent from flow controlmanifestationslost packets (buffer overflow at routers)long delays (queueing in router
buffers)a top-10 problem
Principles of congestion control
Transport Layer
3-21
Causescosts of congestion scenario 1 two senders two receiversone router infinite buffers output link capacity Rno retransmission
maximum per-connection throughput R2
unlimited shared output link buffers
Host A
original data lin
Host B
throughput lout
R2
R2
l out
lin R2de
lay
lin large delays as arrival
rate lin approaches capacity
Transport Layer
3-22
one router finite buffers sender retransmission of timed-out packet application-layer input = application-layer output lin = lout
transport-layer input includes retransmissions lin lin
finite shared output link buffers
Host A
lin original data
Host B
loutlin original data plus retransmitted data
Causescosts of congestion scenario 2
Transport Layer
3-23
idealization perfect knowledge
sender sends only when router buffers available
finite shared output link buffers
lin original dataloutlin original data plus
retransmitted data
copy
free buffer space
R2
R2
l out
lin
Causescosts of congestion scenario 2
Host B
A
Transport Layer
3-24
lin original dataloutlin original data plus
retransmitted data
copy
no buffer space
Idealization known loss packets can be lost dropped at router due to full buffers
sender only resends if packet known to be lost
Causescosts of congestion scenario 2
A
Host B
Transport Layer
3-25
lin original dataloutlin original data plus
retransmitted data
free buffer space
Causescosts of congestion scenario 2
Idealization known loss packets can be lost dropped at router due to full buffers
sender only resends if packet known to be lost
R2
R2lin
l out
when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)
A
Host B
Transport Layer
3-26
A
lin loutlincopy
free buffer space
timeout
R2
R2lin
l out
when sending at R2 some packets are retransmissions including duplicated that are delivered
Host B
Realistic duplicates packets can be lost
dropped at router due to full buffers
sender times out prematurely sending two copies both of which are delivered
Causescosts of congestion scenario 2
Transport Layer
3-27
R2
l out
when sending at R2 some packets are retransmissions including duplicated that are delivered
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple
copies of pkt decreasing goodput
R2lin
Causescosts of congestion scenario 2 Realistic duplicates packets can be lost
dropped at router due to full buffers
sender times out prematurely sending two copies both of which are delivered
Transport Layer
3-28
four sendersmultihop paths timeoutretransmit
Q what happens as lin and lin
rsquo increase
finite shared output link buffers
Host A lout
Causescosts of congestion scenario 3
Host B
Host CHost D
lin original data
lin original data plus retransmitted data
A as red linrsquo increases all
arriving blue pkts at upper queue are dropped blue throughput g 0
Transport Layer
3-29
another ldquocostrdquo of congestion when packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Causescosts of congestion scenario 3
C2
C2
l out
linrsquo
Transport Layer
3-30
Approaches towards congestion controltwo broad approaches towards congestion
controlend-end congestion
controlno explicit
feedback from network
congestion inferred from end-system observed loss delay
approach taken by TCP
network-assisted congestion control
routers provide feedback to end systemssingle bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
explicit rate for sender to send at
Transport Layer
3-31Case study ATM ABR congestion
controlABR available bit rateldquoelastic servicerdquo if senderrsquos path
ldquounderloadedrdquo sender should use
available bandwidth
if senderrsquos path congested sender throttled to
minimum guaranteed rate
RM (resource management) cellssent by sender interspersed
with data cellsbits in RM cell set by switches
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild
congestion) CI bit congestion indication
RM cells returned to sender by receiver with bits intact
Transport Layer
3-32Case study ATM ABR congestion control
two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path
EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in
returned RM cell
RM cell data cell
Transport Layer
3-33
TCP congestion control
Transport Layer
3-34
TCP congestion control additive increase multiplicative decrease approach sender increases transmission
rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1
MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half
after loss
cwnd
TCP
sen
der
cong
estio
n w
indo
w s
ize
AIMD saw toothbehavior probing
for bandwidth
additively increase window size helliphellip until loss occurs (then cut window in half)
time
Transport Layer
3-35TCP Congestion Control details
sender limits transmission
cwnd is dynamic function of perceived network congestion
TCP sending rate
roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte
ACKed sent not-yet ACKed(ldquoin-flightrdquo)
last byte sent
cwnd
LastByteSent-LastByteAcked
lt cwnd
sender sequence number space
rate~~cwndRTT bytessec
Transport Layer
3-36TCP Slow Start
when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for
every ACK received
summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
Transport Layer
3-37
TCP detecting reacting to loss
loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then
grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly
TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)
Transport Layer
3-38
Q when should the exponential increase switch to linear
A when cwnd gets to 12 of its value before timeout
Implementationvariable ssthresh on loss event ssthresh is
set to 12 of cwnd just before loss event
TCP switching from slow start to CA
Transport Layer
3-39
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer
3-40TCP throughput
avg TCP thruput as function of window size RTTignore slow start assume always data to
sendW window size (measured in bytes) where loss
occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW
W2
avg TCP thruput = 34WRTTbytessec
Transport Layer
3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want
10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L
[Mathis 1997]
to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate
new versions of TCP for high-speed
TCP throughput = 122 MSSRTT L
Transport Layer
3-42
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP Fairness
TCP connection 2
Transport Layer
3-43
Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout
increasesmultiplicative decrease decreases throughput
proportionally R
R
equal bandwidth share
Connection 1 throughputCon
nec t
ion
2 t h
roug
hput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer
3-44
Fairness (more)Fairness and UDP
multimedia apps often do not use TCP do not want rate throttled
by congestion control
instead use UDP send audiovideo at
constant rate tolerate packet loss
Fairness parallel TCP connections
application can open multiple parallel connections between two hosts
web browsers do this eg link of rate R with 9
existing connectionsnew app asks for 1 TCP gets
rate R10new app asks for 11 TCPs
gets R2
- Congestion in networks
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (2)
- TCP ACK generation [RFC 1122 RFC 2581]
- TCP fast retransmit
- TCP fast retransmit (2)
- TCP flow control
- TCP flow control (2)
- Connection Management
- Agreeing to establish a connection
- Agreeing to establish a connection (2)
- TCP 3-way handshake
- TCP 3-way handshake FSM
- TCP closing a connection
- TCP closing a connection (2)
- Principles of congestion control
- Principles of congestion control (2)
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 2 (3)
- Causescosts of congestion scenario 2 (4)
- Causescosts of congestion scenario 2 (5)
- Causescosts of congestion scenario 2 (6)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- Case study ATM ABR congestion control
- Case study ATM ABR congestion control (2)
- TCP congestion control
- TCP congestion control additive increase multiplicative decre
- TCP Congestion Control details
- TCP Slow Start
- TCP detecting reacting to loss
- TCP switching from slow start to CA
- Summary TCP Congestion Control
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- TCP Fairness
- Why is TCP fair
- Fairness (more)
-
Transport Layer
3-5TCP retransmission scenarios
lost ACK scenario
Host BHost A
Seq=92 8 bytes of data
ACK=100
Seq=92 8 bytes of data
Xtimeo
ut
ACK=100
premature timeout
Host BHost A
Seq=92 8 bytes of data
ACK=100
Seq=92 8bytes of data
timeo
utACK=120
Seq=100 20 bytes of data
ACK=120
SendBase=100SendBase=120
SendBase=120
SendBase=92
Transport Layer
3-6TCP retransmission scenarios
X
cumulative ACK
Host BHost A
Seq=92 8 bytes of data
ACK=100
Seq=120 15 bytes of data
timeo
ut
Seq=100 20 bytes of data
ACK=120
Transport Layer
3-7TCP ACK generation [RFC 1122 RFC 2581]
event at receiver
arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
arrival of in-order segment withexpected seq One other segment has ACK pending
arrival of out-of-order segmenthigher-than-expect seq Gap detected
arrival of segment that partially or completely fills gap
TCP receiver action
delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
immediately send single cumulative ACK ACKing both in-order segments
immediately send duplicate ACK indicating seq of next expected byte
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer
3-8TCP fast retransmittime-out period often
relatively long long delay before
resending lost packet
detect lost segments via duplicate ACKs sender often sends many
segments back-to-back if segment is lost there will
likely be many duplicate ACKs
if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unacked segment with smallest seq likely that unacked
segment lost so donrsquot wait for timeout
TCP fast retransmit
(ldquotriple duplicate ACKsrdquo)
Transport Layer
3-9
X
fast retransmit after sender receipt of triple duplicate ACK
Host BHost A
Seq=92 8 bytes of data
ACK=100
timeo
ut
ACK=100
ACK=100
ACK=100
TCP fast retransmit
Seq=100 20 bytes of data
Seq=100 20 bytes of data
Transport Layer
3-10
TCP flow controlapplicationprocess
TCP socketreceiver buffers
TCPcode
IPcode
applicationOS
receiver protocol stack
application may remove data from
TCP socket buffers hellip
hellip slower than TCP receiver is delivering(sender is sending)
from sender
receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast
flow control
Transport Layer
3-11
TCP flow control
buffered data
free buffer spacerwnd
RcvBuffer
TCP segment payloads
to application processreceiver ldquoadvertisesrdquo free
buffer space by including rwnd value in TCP header of receiver-to-sender segmentsRcvBuffer size set via socket
options (typical default is 4096 bytes)
many operating systems autoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value
guarantees receive buffer will not overflow
receiver-side buffering
Transport Layer
3-12Connection Managementbefore exchanging data senderreceiver
ldquohandshakerdquoagree to establish connection (each knowing the
other willing to establish connection)agree on connection parameters
connection state ESTABconnection variables
seq client-to-server server-to-clientrcvBuffer size at serverclient
application
network
connection state ESTABconnection Variables
seq client-to-server server-to-clientrcvBuffer size at serverclient
application
network
Socket clientSocket = newSocket(hostnameport
number)
Socket connectionSocket = welcomeSocketaccept()
Transport Layer
3-13
Q will 2-way handshake always work in network
variable delaysretransmitted messages (eg
req_conn(x)) due to message lossmessage reorderingcanrsquot ldquoseerdquo other side
2-way handshake
Letrsquos talkOK ESTAB
ESTAB
choose x req_conn(x)ESTAB
ESTABacc_conn(x)
Agreeing to establish a connection
Transport Layer
3-14
Agreeing to establish a connection
2-way handshake failure scenarios
retransmitreq_conn(
x)
ESTAB
req_conn(x)
half open connection(no client)
client terminat
esserverforgets x
connection x completes
retransmitreq_conn(
x)
ESTAB
req_conn(x)
data(x+1)
retransmitdata(x+1)
acceptdata(x+1)
choose x req_conn(x)ESTAB
ESTAB
acc_conn(x)
client terminat
es
ESTAB
choose x req_conn(x)ESTAB
acc_conn(x)
data(x+1) acceptdata(x+1)
connection x completes server
forgets x
Transport Layer
3-15TCP 3-way handshake
SYNbit=1 Seq=x
choose init seq num xsend TCP SYN msg
ESTAB
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
choose init seq num ysend TCP SYNACKmsg acking SYN
ACKbit=1 ACKnum=y+1
received SYNACK(x) indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data received ACK(y)
indicates client is live
SYNSENT
ESTAB
SYN RCVD
client state
LISTENserver state
LISTEN
Transport Layer
3-16
TCP 3-way handshake FSM
closed
L
listen
SYNrcvd
SYNsent
ESTAB
Socket clientSocket = newSocket(hostnameport
number)
SYN(seq=x)
Socket connectionSocket = welcomeSocketaccept()
SYN(x)SYNACK(seq=yACKnum=x+1)create new socket for communication back to client
SYNACK(seq=yACKnum=x+1)ACK(ACKnum=y+1)ACK(ACKnum=y+1)
L
Transport Layer
3-17TCP closing a connection
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer
3-18
FIN_WAIT_2
CLOSE_WAIT
FINbit=1 seq=y
ACKbit=1 ACKnum=y+1
ACKbit=1 ACKnum=x+1 wait for server
closecan stillsend data
can no longersend data
LAST_ACK
CLOSED
TIMED_WAIT
timed wait for 2max
segment lifetime
CLOSED
TCP closing a connection
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data
clientSocketclose()
client state server state
ESTABESTAB
Transport Layer
3-19Principles of congestion control
Transport Layer
3-20
congestion informally ldquotoo many sources sending too much data too
fast for network to handlerdquodifferent from flow controlmanifestationslost packets (buffer overflow at routers)long delays (queueing in router
buffers)a top-10 problem
Principles of congestion control
Transport Layer
3-21
Causescosts of congestion scenario 1 two senders two receiversone router infinite buffers output link capacity Rno retransmission
maximum per-connection throughput R2
unlimited shared output link buffers
Host A
original data lin
Host B
throughput lout
R2
R2
l out
lin R2de
lay
lin large delays as arrival
rate lin approaches capacity
Transport Layer
3-22
one router finite buffers sender retransmission of timed-out packet application-layer input = application-layer output lin = lout
transport-layer input includes retransmissions lin lin
finite shared output link buffers
Host A
lin original data
Host B
loutlin original data plus retransmitted data
Causescosts of congestion scenario 2
Transport Layer
3-23
idealization perfect knowledge
sender sends only when router buffers available
finite shared output link buffers
lin original dataloutlin original data plus
retransmitted data
copy
free buffer space
R2
R2
l out
lin
Causescosts of congestion scenario 2
Host B
A
Transport Layer
3-24
lin original dataloutlin original data plus
retransmitted data
copy
no buffer space
Idealization known loss packets can be lost dropped at router due to full buffers
sender only resends if packet known to be lost
Causescosts of congestion scenario 2
A
Host B
Transport Layer
3-25
lin original dataloutlin original data plus
retransmitted data
free buffer space
Causescosts of congestion scenario 2
Idealization known loss packets can be lost dropped at router due to full buffers
sender only resends if packet known to be lost
R2
R2lin
l out
when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)
A
Host B
Transport Layer
3-26
A
lin loutlincopy
free buffer space
timeout
R2
R2lin
l out
when sending at R2 some packets are retransmissions including duplicated that are delivered
Host B
Realistic duplicates packets can be lost
dropped at router due to full buffers
sender times out prematurely sending two copies both of which are delivered
Causescosts of congestion scenario 2
Transport Layer
3-27
R2
l out
when sending at R2 some packets are retransmissions including duplicated that are delivered
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple
copies of pkt decreasing goodput
R2lin
Causescosts of congestion scenario 2 Realistic duplicates packets can be lost
dropped at router due to full buffers
sender times out prematurely sending two copies both of which are delivered
Transport Layer
3-28
four sendersmultihop paths timeoutretransmit
Q what happens as lin and lin
rsquo increase
finite shared output link buffers
Host A lout
Causescosts of congestion scenario 3
Host B
Host CHost D
lin original data
lin original data plus retransmitted data
A as red linrsquo increases all
arriving blue pkts at upper queue are dropped blue throughput g 0
Transport Layer
3-29
another ldquocostrdquo of congestion when packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Causescosts of congestion scenario 3
C2
C2
l out
linrsquo
Transport Layer
3-30
Approaches towards congestion controltwo broad approaches towards congestion
controlend-end congestion
controlno explicit
feedback from network
congestion inferred from end-system observed loss delay
approach taken by TCP
network-assisted congestion control
routers provide feedback to end systemssingle bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
explicit rate for sender to send at
Transport Layer
3-31Case study ATM ABR congestion
controlABR available bit rateldquoelastic servicerdquo if senderrsquos path
ldquounderloadedrdquo sender should use
available bandwidth
if senderrsquos path congested sender throttled to
minimum guaranteed rate
RM (resource management) cellssent by sender interspersed
with data cellsbits in RM cell set by switches
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild
congestion) CI bit congestion indication
RM cells returned to sender by receiver with bits intact
Transport Layer
3-32Case study ATM ABR congestion control
two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path
EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in
returned RM cell
RM cell data cell
Transport Layer
3-33
TCP congestion control
Transport Layer
3-34
TCP congestion control additive increase multiplicative decrease approach sender increases transmission
rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1
MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half
after loss
cwnd
TCP
sen
der
cong
estio
n w
indo
w s
ize
AIMD saw toothbehavior probing
for bandwidth
additively increase window size helliphellip until loss occurs (then cut window in half)
time
Transport Layer
3-35TCP Congestion Control details
sender limits transmission
cwnd is dynamic function of perceived network congestion
TCP sending rate
roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte
ACKed sent not-yet ACKed(ldquoin-flightrdquo)
last byte sent
cwnd
LastByteSent-LastByteAcked
lt cwnd
sender sequence number space
rate~~cwndRTT bytessec
Transport Layer
3-36TCP Slow Start
when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for
every ACK received
summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
Transport Layer
3-37
TCP detecting reacting to loss
loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then
grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly
TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)
Transport Layer
3-38
Q when should the exponential increase switch to linear
A when cwnd gets to 12 of its value before timeout
Implementationvariable ssthresh on loss event ssthresh is
set to 12 of cwnd just before loss event
TCP switching from slow start to CA
Transport Layer
3-39
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer
3-40TCP throughput
avg TCP thruput as function of window size RTTignore slow start assume always data to
sendW window size (measured in bytes) where loss
occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW
W2
avg TCP thruput = 34WRTTbytessec
Transport Layer
3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want
10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L
[Mathis 1997]
to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate
new versions of TCP for high-speed
TCP throughput = 122 MSSRTT L
Transport Layer
3-42
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP Fairness
TCP connection 2
Transport Layer
3-43
Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout
increasesmultiplicative decrease decreases throughput
proportionally R
R
equal bandwidth share
Connection 1 throughputCon
nec t
ion
2 t h
roug
hput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer
3-44
Fairness (more)Fairness and UDP
multimedia apps often do not use TCP do not want rate throttled
by congestion control
instead use UDP send audiovideo at
constant rate tolerate packet loss
Fairness parallel TCP connections
application can open multiple parallel connections between two hosts
web browsers do this eg link of rate R with 9
existing connectionsnew app asks for 1 TCP gets
rate R10new app asks for 11 TCPs
gets R2
- Congestion in networks
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (2)
- TCP ACK generation [RFC 1122 RFC 2581]
- TCP fast retransmit
- TCP fast retransmit (2)
- TCP flow control
- TCP flow control (2)
- Connection Management
- Agreeing to establish a connection
- Agreeing to establish a connection (2)
- TCP 3-way handshake
- TCP 3-way handshake FSM
- TCP closing a connection
- TCP closing a connection (2)
- Principles of congestion control
- Principles of congestion control (2)
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 2 (3)
- Causescosts of congestion scenario 2 (4)
- Causescosts of congestion scenario 2 (5)
- Causescosts of congestion scenario 2 (6)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- Case study ATM ABR congestion control
- Case study ATM ABR congestion control (2)
- TCP congestion control
- TCP congestion control additive increase multiplicative decre
- TCP Congestion Control details
- TCP Slow Start
- TCP detecting reacting to loss
- TCP switching from slow start to CA
- Summary TCP Congestion Control
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- TCP Fairness
- Why is TCP fair
- Fairness (more)
-
Transport Layer
3-6TCP retransmission scenarios
X
cumulative ACK
Host BHost A
Seq=92 8 bytes of data
ACK=100
Seq=120 15 bytes of data
timeo
ut
Seq=100 20 bytes of data
ACK=120
Transport Layer
3-7TCP ACK generation [RFC 1122 RFC 2581]
event at receiver
arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
arrival of in-order segment withexpected seq One other segment has ACK pending
arrival of out-of-order segmenthigher-than-expect seq Gap detected
arrival of segment that partially or completely fills gap
TCP receiver action
delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
immediately send single cumulative ACK ACKing both in-order segments
immediately send duplicate ACK indicating seq of next expected byte
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer
3-8TCP fast retransmittime-out period often
relatively long long delay before
resending lost packet
detect lost segments via duplicate ACKs sender often sends many
segments back-to-back if segment is lost there will
likely be many duplicate ACKs
if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unacked segment with smallest seq likely that unacked
segment lost so donrsquot wait for timeout
TCP fast retransmit
(ldquotriple duplicate ACKsrdquo)
Transport Layer
3-9
X
fast retransmit after sender receipt of triple duplicate ACK
Host BHost A
Seq=92 8 bytes of data
ACK=100
timeo
ut
ACK=100
ACK=100
ACK=100
TCP fast retransmit
Seq=100 20 bytes of data
Seq=100 20 bytes of data
Transport Layer
3-10
TCP flow controlapplicationprocess
TCP socketreceiver buffers
TCPcode
IPcode
applicationOS
receiver protocol stack
application may remove data from
TCP socket buffers hellip
hellip slower than TCP receiver is delivering(sender is sending)
from sender
receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast
flow control
Transport Layer
3-11
TCP flow control
buffered data
free buffer spacerwnd
RcvBuffer
TCP segment payloads
to application processreceiver ldquoadvertisesrdquo free
buffer space by including rwnd value in TCP header of receiver-to-sender segmentsRcvBuffer size set via socket
options (typical default is 4096 bytes)
many operating systems autoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value
guarantees receive buffer will not overflow
receiver-side buffering
Transport Layer
3-12Connection Managementbefore exchanging data senderreceiver
ldquohandshakerdquoagree to establish connection (each knowing the
other willing to establish connection)agree on connection parameters
connection state ESTABconnection variables
seq client-to-server server-to-clientrcvBuffer size at serverclient
application
network
connection state ESTABconnection Variables
seq client-to-server server-to-clientrcvBuffer size at serverclient
application
network
Socket clientSocket = newSocket(hostnameport
number)
Socket connectionSocket = welcomeSocketaccept()
Transport Layer
3-13
Q will 2-way handshake always work in network
variable delaysretransmitted messages (eg
req_conn(x)) due to message lossmessage reorderingcanrsquot ldquoseerdquo other side
2-way handshake
Letrsquos talkOK ESTAB
ESTAB
choose x req_conn(x)ESTAB
ESTABacc_conn(x)
Agreeing to establish a connection
Transport Layer
3-14
Agreeing to establish a connection
2-way handshake failure scenarios
retransmitreq_conn(
x)
ESTAB
req_conn(x)
half open connection(no client)
client terminat
esserverforgets x
connection x completes
retransmitreq_conn(
x)
ESTAB
req_conn(x)
data(x+1)
retransmitdata(x+1)
acceptdata(x+1)
choose x req_conn(x)ESTAB
ESTAB
acc_conn(x)
client terminat
es
ESTAB
choose x req_conn(x)ESTAB
acc_conn(x)
data(x+1) acceptdata(x+1)
connection x completes server
forgets x
Transport Layer
3-15TCP 3-way handshake
SYNbit=1 Seq=x
choose init seq num xsend TCP SYN msg
ESTAB
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
choose init seq num ysend TCP SYNACKmsg acking SYN
ACKbit=1 ACKnum=y+1
received SYNACK(x) indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data received ACK(y)
indicates client is live
SYNSENT
ESTAB
SYN RCVD
client state
LISTENserver state
LISTEN
Transport Layer
3-16
TCP 3-way handshake FSM
closed
L
listen
SYNrcvd
SYNsent
ESTAB
Socket clientSocket = newSocket(hostnameport
number)
SYN(seq=x)
Socket connectionSocket = welcomeSocketaccept()
SYN(x)SYNACK(seq=yACKnum=x+1)create new socket for communication back to client
SYNACK(seq=yACKnum=x+1)ACK(ACKnum=y+1)ACK(ACKnum=y+1)
L
Transport Layer
3-17TCP closing a connection
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer
3-18
FIN_WAIT_2
CLOSE_WAIT
FINbit=1 seq=y
ACKbit=1 ACKnum=y+1
ACKbit=1 ACKnum=x+1 wait for server
closecan stillsend data
can no longersend data
LAST_ACK
CLOSED
TIMED_WAIT
timed wait for 2max
segment lifetime
CLOSED
TCP closing a connection
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data
clientSocketclose()
client state server state
ESTABESTAB
Transport Layer
3-19Principles of congestion control
Transport Layer
3-20
congestion informally ldquotoo many sources sending too much data too
fast for network to handlerdquodifferent from flow controlmanifestationslost packets (buffer overflow at routers)long delays (queueing in router
buffers)a top-10 problem
Principles of congestion control
Transport Layer
3-21
Causescosts of congestion scenario 1 two senders two receiversone router infinite buffers output link capacity Rno retransmission
maximum per-connection throughput R2
unlimited shared output link buffers
Host A
original data lin
Host B
throughput lout
R2
R2
l out
lin R2de
lay
lin large delays as arrival
rate lin approaches capacity
Transport Layer
3-22
one router finite buffers sender retransmission of timed-out packet application-layer input = application-layer output lin = lout
transport-layer input includes retransmissions lin lin
finite shared output link buffers
Host A
lin original data
Host B
loutlin original data plus retransmitted data
Causescosts of congestion scenario 2
Transport Layer
3-23
idealization perfect knowledge
sender sends only when router buffers available
finite shared output link buffers
lin original dataloutlin original data plus
retransmitted data
copy
free buffer space
R2
R2
l out
lin
Causescosts of congestion scenario 2
Host B
A
Transport Layer
3-24
lin original dataloutlin original data plus
retransmitted data
copy
no buffer space
Idealization known loss packets can be lost dropped at router due to full buffers
sender only resends if packet known to be lost
Causescosts of congestion scenario 2
A
Host B
Transport Layer
3-25
lin original dataloutlin original data plus
retransmitted data
free buffer space
Causescosts of congestion scenario 2
Idealization known loss packets can be lost dropped at router due to full buffers
sender only resends if packet known to be lost
R2
R2lin
l out
when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)
A
Host B
Transport Layer
3-26
A
lin loutlincopy
free buffer space
timeout
R2
R2lin
l out
when sending at R2 some packets are retransmissions including duplicated that are delivered
Host B
Realistic duplicates packets can be lost
dropped at router due to full buffers
sender times out prematurely sending two copies both of which are delivered
Causescosts of congestion scenario 2
Transport Layer
3-27
R2
l out
when sending at R2 some packets are retransmissions including duplicated that are delivered
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple
copies of pkt decreasing goodput
R2lin
Causescosts of congestion scenario 2 Realistic duplicates packets can be lost
dropped at router due to full buffers
sender times out prematurely sending two copies both of which are delivered
Transport Layer
3-28
four sendersmultihop paths timeoutretransmit
Q what happens as lin and lin
rsquo increase
finite shared output link buffers
Host A lout
Causescosts of congestion scenario 3
Host B
Host CHost D
lin original data
lin original data plus retransmitted data
A as red linrsquo increases all
arriving blue pkts at upper queue are dropped blue throughput g 0
Transport Layer
3-29
another ldquocostrdquo of congestion when packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Causescosts of congestion scenario 3
C2
C2
l out
linrsquo
Transport Layer
3-30
Approaches towards congestion controltwo broad approaches towards congestion
controlend-end congestion
controlno explicit
feedback from network
congestion inferred from end-system observed loss delay
approach taken by TCP
network-assisted congestion control
routers provide feedback to end systemssingle bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
explicit rate for sender to send at
Transport Layer
3-31Case study ATM ABR congestion
controlABR available bit rateldquoelastic servicerdquo if senderrsquos path
ldquounderloadedrdquo sender should use
available bandwidth
if senderrsquos path congested sender throttled to
minimum guaranteed rate
RM (resource management) cellssent by sender interspersed
with data cellsbits in RM cell set by switches
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild
congestion) CI bit congestion indication
RM cells returned to sender by receiver with bits intact
Transport Layer
3-32Case study ATM ABR congestion control
two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path
EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in
returned RM cell
RM cell data cell
Transport Layer
3-33
TCP congestion control
Transport Layer
3-34
TCP congestion control additive increase multiplicative decrease approach sender increases transmission
rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1
MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half
after loss
cwnd
TCP
sen
der
cong
estio
n w
indo
w s
ize
AIMD saw toothbehavior probing
for bandwidth
additively increase window size helliphellip until loss occurs (then cut window in half)
time
Transport Layer
3-35TCP Congestion Control details
sender limits transmission
cwnd is dynamic function of perceived network congestion
TCP sending rate
roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte
ACKed sent not-yet ACKed(ldquoin-flightrdquo)
last byte sent
cwnd
LastByteSent-LastByteAcked
lt cwnd
sender sequence number space
rate~~cwndRTT bytessec
Transport Layer
3-36TCP Slow Start
when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for
every ACK received
summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
Transport Layer
3-37
TCP detecting reacting to loss
loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then
grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly
TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)
Transport Layer
3-38
Q when should the exponential increase switch to linear
A when cwnd gets to 12 of its value before timeout
Implementationvariable ssthresh on loss event ssthresh is
set to 12 of cwnd just before loss event
TCP switching from slow start to CA
Transport Layer
3-39
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer
3-40TCP throughput
avg TCP thruput as function of window size RTTignore slow start assume always data to
sendW window size (measured in bytes) where loss
occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW
W2
avg TCP thruput = 34WRTTbytessec
Transport Layer
3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want
10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L
[Mathis 1997]
to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate
new versions of TCP for high-speed
TCP throughput = 122 MSSRTT L
Transport Layer
3-42
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP Fairness
TCP connection 2
Transport Layer
3-43
Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout
increasesmultiplicative decrease decreases throughput
proportionally R
R
equal bandwidth share
Connection 1 throughputCon
nec t
ion
2 t h
roug
hput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer
3-44
Fairness (more)Fairness and UDP
multimedia apps often do not use TCP do not want rate throttled
by congestion control
instead use UDP send audiovideo at
constant rate tolerate packet loss
Fairness parallel TCP connections
application can open multiple parallel connections between two hosts
web browsers do this eg link of rate R with 9
existing connectionsnew app asks for 1 TCP gets
rate R10new app asks for 11 TCPs
gets R2
- Congestion in networks
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (2)
- TCP ACK generation [RFC 1122 RFC 2581]
- TCP fast retransmit
- TCP fast retransmit (2)
- TCP flow control
- TCP flow control (2)
- Connection Management
- Agreeing to establish a connection
- Agreeing to establish a connection (2)
- TCP 3-way handshake
- TCP 3-way handshake FSM
- TCP closing a connection
- TCP closing a connection (2)
- Principles of congestion control
- Principles of congestion control (2)
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 2 (3)
- Causescosts of congestion scenario 2 (4)
- Causescosts of congestion scenario 2 (5)
- Causescosts of congestion scenario 2 (6)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- Case study ATM ABR congestion control
- Case study ATM ABR congestion control (2)
- TCP congestion control
- TCP congestion control additive increase multiplicative decre
- TCP Congestion Control details
- TCP Slow Start
- TCP detecting reacting to loss
- TCP switching from slow start to CA
- Summary TCP Congestion Control
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- TCP Fairness
- Why is TCP fair
- Fairness (more)
-
Transport Layer
3-7TCP ACK generation [RFC 1122 RFC 2581]
event at receiver
arrival of in-order segment withexpected seq All data up toexpected seq already ACKed
arrival of in-order segment withexpected seq One other segment has ACK pending
arrival of out-of-order segmenthigher-than-expect seq Gap detected
arrival of segment that partially or completely fills gap
TCP receiver action
delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK
immediately send single cumulative ACK ACKing both in-order segments
immediately send duplicate ACK indicating seq of next expected byte
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer
3-8TCP fast retransmittime-out period often
relatively long long delay before
resending lost packet
detect lost segments via duplicate ACKs sender often sends many
segments back-to-back if segment is lost there will
likely be many duplicate ACKs
if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unacked segment with smallest seq likely that unacked
segment lost so donrsquot wait for timeout
TCP fast retransmit
(ldquotriple duplicate ACKsrdquo)
Transport Layer
3-9
X
fast retransmit after sender receipt of triple duplicate ACK
Host BHost A
Seq=92 8 bytes of data
ACK=100
timeo
ut
ACK=100
ACK=100
ACK=100
TCP fast retransmit
Seq=100 20 bytes of data
Seq=100 20 bytes of data
Transport Layer
3-10
TCP flow controlapplicationprocess
TCP socketreceiver buffers
TCPcode
IPcode
applicationOS
receiver protocol stack
application may remove data from
TCP socket buffers hellip
hellip slower than TCP receiver is delivering(sender is sending)
from sender
receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast
flow control
Transport Layer
3-11
TCP flow control
buffered data
free buffer spacerwnd
RcvBuffer
TCP segment payloads
to application processreceiver ldquoadvertisesrdquo free
buffer space by including rwnd value in TCP header of receiver-to-sender segmentsRcvBuffer size set via socket
options (typical default is 4096 bytes)
many operating systems autoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value
guarantees receive buffer will not overflow
receiver-side buffering
Transport Layer
3-12Connection Managementbefore exchanging data senderreceiver
ldquohandshakerdquoagree to establish connection (each knowing the
other willing to establish connection)agree on connection parameters
connection state ESTABconnection variables
seq client-to-server server-to-clientrcvBuffer size at serverclient
application
network
connection state ESTABconnection Variables
seq client-to-server server-to-clientrcvBuffer size at serverclient
application
network
Socket clientSocket = newSocket(hostnameport
number)
Socket connectionSocket = welcomeSocketaccept()
Transport Layer
3-13
Q will 2-way handshake always work in network
variable delaysretransmitted messages (eg
req_conn(x)) due to message lossmessage reorderingcanrsquot ldquoseerdquo other side
2-way handshake
Letrsquos talkOK ESTAB
ESTAB
choose x req_conn(x)ESTAB
ESTABacc_conn(x)
Agreeing to establish a connection
Transport Layer
3-14
Agreeing to establish a connection
2-way handshake failure scenarios
retransmitreq_conn(
x)
ESTAB
req_conn(x)
half open connection(no client)
client terminat
esserverforgets x
connection x completes
retransmitreq_conn(
x)
ESTAB
req_conn(x)
data(x+1)
retransmitdata(x+1)
acceptdata(x+1)
choose x req_conn(x)ESTAB
ESTAB
acc_conn(x)
client terminat
es
ESTAB
choose x req_conn(x)ESTAB
acc_conn(x)
data(x+1) acceptdata(x+1)
connection x completes server
forgets x
Transport Layer
3-15TCP 3-way handshake
SYNbit=1 Seq=x
choose init seq num xsend TCP SYN msg
ESTAB
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
choose init seq num ysend TCP SYNACKmsg acking SYN
ACKbit=1 ACKnum=y+1
received SYNACK(x) indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data received ACK(y)
indicates client is live
SYNSENT
ESTAB
SYN RCVD
client state
LISTENserver state
LISTEN
Transport Layer
3-16
TCP 3-way handshake FSM
closed
L
listen
SYNrcvd
SYNsent
ESTAB
Socket clientSocket = newSocket(hostnameport
number)
SYN(seq=x)
Socket connectionSocket = welcomeSocketaccept()
SYN(x)SYNACK(seq=yACKnum=x+1)create new socket for communication back to client
SYNACK(seq=yACKnum=x+1)ACK(ACKnum=y+1)ACK(ACKnum=y+1)
L
Transport Layer
3-17TCP closing a connection
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer
3-18
FIN_WAIT_2
CLOSE_WAIT
FINbit=1 seq=y
ACKbit=1 ACKnum=y+1
ACKbit=1 ACKnum=x+1 wait for server
closecan stillsend data
can no longersend data
LAST_ACK
CLOSED
TIMED_WAIT
timed wait for 2max
segment lifetime
CLOSED
TCP closing a connection
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data
clientSocketclose()
client state server state
ESTABESTAB
Transport Layer
3-19Principles of congestion control
Transport Layer
3-20
congestion informally ldquotoo many sources sending too much data too
fast for network to handlerdquodifferent from flow controlmanifestationslost packets (buffer overflow at routers)long delays (queueing in router
buffers)a top-10 problem
Principles of congestion control
Transport Layer
3-21
Causescosts of congestion scenario 1 two senders two receiversone router infinite buffers output link capacity Rno retransmission
maximum per-connection throughput R2
unlimited shared output link buffers
Host A
original data lin
Host B
throughput lout
R2
R2
l out
lin R2de
lay
lin large delays as arrival
rate lin approaches capacity
Transport Layer
3-22
one router finite buffers sender retransmission of timed-out packet application-layer input = application-layer output lin = lout
transport-layer input includes retransmissions lin lin
finite shared output link buffers
Host A
lin original data
Host B
loutlin original data plus retransmitted data
Causescosts of congestion scenario 2
Transport Layer
3-23
idealization perfect knowledge
sender sends only when router buffers available
finite shared output link buffers
lin original dataloutlin original data plus
retransmitted data
copy
free buffer space
R2
R2
l out
lin
Causescosts of congestion scenario 2
Host B
A
Transport Layer
3-24
lin original dataloutlin original data plus
retransmitted data
copy
no buffer space
Idealization known loss packets can be lost dropped at router due to full buffers
sender only resends if packet known to be lost
Causescosts of congestion scenario 2
A
Host B
Transport Layer
3-25
lin original dataloutlin original data plus
retransmitted data
free buffer space
Causescosts of congestion scenario 2
Idealization known loss packets can be lost dropped at router due to full buffers
sender only resends if packet known to be lost
R2
R2lin
l out
when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)
A
Host B
Transport Layer
3-26
A
lin loutlincopy
free buffer space
timeout
R2
R2lin
l out
when sending at R2 some packets are retransmissions including duplicated that are delivered
Host B
Realistic duplicates packets can be lost
dropped at router due to full buffers
sender times out prematurely sending two copies both of which are delivered
Causescosts of congestion scenario 2
Transport Layer
3-27
R2
l out
when sending at R2 some packets are retransmissions including duplicated that are delivered
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple
copies of pkt decreasing goodput
R2lin
Causescosts of congestion scenario 2 Realistic duplicates packets can be lost
dropped at router due to full buffers
sender times out prematurely sending two copies both of which are delivered
Transport Layer
3-28
four sendersmultihop paths timeoutretransmit
Q what happens as lin and lin
rsquo increase
finite shared output link buffers
Host A lout
Causescosts of congestion scenario 3
Host B
Host CHost D
lin original data
lin original data plus retransmitted data
A as red linrsquo increases all
arriving blue pkts at upper queue are dropped blue throughput g 0
Transport Layer
3-29
another ldquocostrdquo of congestion when packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Causescosts of congestion scenario 3
C2
C2
l out
linrsquo
Transport Layer
3-30
Approaches towards congestion controltwo broad approaches towards congestion
controlend-end congestion
controlno explicit
feedback from network
congestion inferred from end-system observed loss delay
approach taken by TCP
network-assisted congestion control
routers provide feedback to end systemssingle bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
explicit rate for sender to send at
Transport Layer
3-31Case study ATM ABR congestion
controlABR available bit rateldquoelastic servicerdquo if senderrsquos path
ldquounderloadedrdquo sender should use
available bandwidth
if senderrsquos path congested sender throttled to
minimum guaranteed rate
RM (resource management) cellssent by sender interspersed
with data cellsbits in RM cell set by switches
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild
congestion) CI bit congestion indication
RM cells returned to sender by receiver with bits intact
Transport Layer
3-32Case study ATM ABR congestion control
two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path
EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in
returned RM cell
RM cell data cell
Transport Layer
3-33
TCP congestion control
Transport Layer
3-34
TCP congestion control additive increase multiplicative decrease approach sender increases transmission
rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1
MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half
after loss
cwnd
TCP
sen
der
cong
estio
n w
indo
w s
ize
AIMD saw toothbehavior probing
for bandwidth
additively increase window size helliphellip until loss occurs (then cut window in half)
time
Transport Layer
3-35TCP Congestion Control details
sender limits transmission
cwnd is dynamic function of perceived network congestion
TCP sending rate
roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte
ACKed sent not-yet ACKed(ldquoin-flightrdquo)
last byte sent
cwnd
LastByteSent-LastByteAcked
lt cwnd
sender sequence number space
rate~~cwndRTT bytessec
Transport Layer
3-36TCP Slow Start
when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for
every ACK received
summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
Transport Layer
3-37
TCP detecting reacting to loss
loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then
grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly
TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)
Transport Layer
3-38
Q when should the exponential increase switch to linear
A when cwnd gets to 12 of its value before timeout
Implementationvariable ssthresh on loss event ssthresh is
set to 12 of cwnd just before loss event
TCP switching from slow start to CA
Transport Layer
3-39
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer
3-40TCP throughput
avg TCP thruput as function of window size RTTignore slow start assume always data to
sendW window size (measured in bytes) where loss
occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW
W2
avg TCP thruput = 34WRTTbytessec
Transport Layer
3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want
10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L
[Mathis 1997]
to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate
new versions of TCP for high-speed
TCP throughput = 122 MSSRTT L
Transport Layer
3-42
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP Fairness
TCP connection 2
Transport Layer
3-43
Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout
increasesmultiplicative decrease decreases throughput
proportionally R
R
equal bandwidth share
Connection 1 throughputCon
nec t
ion
2 t h
roug
hput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer
3-44
Fairness (more)Fairness and UDP
multimedia apps often do not use TCP do not want rate throttled
by congestion control
instead use UDP send audiovideo at
constant rate tolerate packet loss
Fairness parallel TCP connections
application can open multiple parallel connections between two hosts
web browsers do this eg link of rate R with 9
existing connectionsnew app asks for 1 TCP gets
rate R10new app asks for 11 TCPs
gets R2
- Congestion in networks
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (2)
- TCP ACK generation [RFC 1122 RFC 2581]
- TCP fast retransmit
- TCP fast retransmit (2)
- TCP flow control
- TCP flow control (2)
- Connection Management
- Agreeing to establish a connection
- Agreeing to establish a connection (2)
- TCP 3-way handshake
- TCP 3-way handshake FSM
- TCP closing a connection
- TCP closing a connection (2)
- Principles of congestion control
- Principles of congestion control (2)
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 2 (3)
- Causescosts of congestion scenario 2 (4)
- Causescosts of congestion scenario 2 (5)
- Causescosts of congestion scenario 2 (6)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- Case study ATM ABR congestion control
- Case study ATM ABR congestion control (2)
- TCP congestion control
- TCP congestion control additive increase multiplicative decre
- TCP Congestion Control details
- TCP Slow Start
- TCP detecting reacting to loss
- TCP switching from slow start to CA
- Summary TCP Congestion Control
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- TCP Fairness
- Why is TCP fair
- Fairness (more)
-
Transport Layer
3-8TCP fast retransmittime-out period often
relatively long long delay before
resending lost packet
detect lost segments via duplicate ACKs sender often sends many
segments back-to-back if segment is lost there will
likely be many duplicate ACKs
if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unacked segment with smallest seq likely that unacked
segment lost so donrsquot wait for timeout
TCP fast retransmit
(ldquotriple duplicate ACKsrdquo)
Transport Layer
3-9
X
fast retransmit after sender receipt of triple duplicate ACK
Host BHost A
Seq=92 8 bytes of data
ACK=100
timeo
ut
ACK=100
ACK=100
ACK=100
TCP fast retransmit
Seq=100 20 bytes of data
Seq=100 20 bytes of data
Transport Layer
3-10
TCP flow controlapplicationprocess
TCP socketreceiver buffers
TCPcode
IPcode
applicationOS
receiver protocol stack
application may remove data from
TCP socket buffers hellip
hellip slower than TCP receiver is delivering(sender is sending)
from sender
receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast
flow control
Transport Layer
3-11
TCP flow control
buffered data
free buffer spacerwnd
RcvBuffer
TCP segment payloads
to application processreceiver ldquoadvertisesrdquo free
buffer space by including rwnd value in TCP header of receiver-to-sender segmentsRcvBuffer size set via socket
options (typical default is 4096 bytes)
many operating systems autoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value
guarantees receive buffer will not overflow
receiver-side buffering
Transport Layer
3-12Connection Managementbefore exchanging data senderreceiver
ldquohandshakerdquoagree to establish connection (each knowing the
other willing to establish connection)agree on connection parameters
connection state ESTABconnection variables
seq client-to-server server-to-clientrcvBuffer size at serverclient
application
network
connection state ESTABconnection Variables
seq client-to-server server-to-clientrcvBuffer size at serverclient
application
network
Socket clientSocket = newSocket(hostnameport
number)
Socket connectionSocket = welcomeSocketaccept()
Transport Layer
3-13
Q will 2-way handshake always work in network
variable delaysretransmitted messages (eg
req_conn(x)) due to message lossmessage reorderingcanrsquot ldquoseerdquo other side
2-way handshake
Letrsquos talkOK ESTAB
ESTAB
choose x req_conn(x)ESTAB
ESTABacc_conn(x)
Agreeing to establish a connection
Transport Layer
3-14
Agreeing to establish a connection
2-way handshake failure scenarios
retransmitreq_conn(
x)
ESTAB
req_conn(x)
half open connection(no client)
client terminat
esserverforgets x
connection x completes
retransmitreq_conn(
x)
ESTAB
req_conn(x)
data(x+1)
retransmitdata(x+1)
acceptdata(x+1)
choose x req_conn(x)ESTAB
ESTAB
acc_conn(x)
client terminat
es
ESTAB
choose x req_conn(x)ESTAB
acc_conn(x)
data(x+1) acceptdata(x+1)
connection x completes server
forgets x
Transport Layer
3-15TCP 3-way handshake
SYNbit=1 Seq=x
choose init seq num xsend TCP SYN msg
ESTAB
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
choose init seq num ysend TCP SYNACKmsg acking SYN
ACKbit=1 ACKnum=y+1
received SYNACK(x) indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data received ACK(y)
indicates client is live
SYNSENT
ESTAB
SYN RCVD
client state
LISTENserver state
LISTEN
Transport Layer
3-16
TCP 3-way handshake FSM
closed
L
listen
SYNrcvd
SYNsent
ESTAB
Socket clientSocket = newSocket(hostnameport
number)
SYN(seq=x)
Socket connectionSocket = welcomeSocketaccept()
SYN(x)SYNACK(seq=yACKnum=x+1)create new socket for communication back to client
SYNACK(seq=yACKnum=x+1)ACK(ACKnum=y+1)ACK(ACKnum=y+1)
L
Transport Layer
3-17TCP closing a connection
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer
3-18
FIN_WAIT_2
CLOSE_WAIT
FINbit=1 seq=y
ACKbit=1 ACKnum=y+1
ACKbit=1 ACKnum=x+1 wait for server
closecan stillsend data
can no longersend data
LAST_ACK
CLOSED
TIMED_WAIT
timed wait for 2max
segment lifetime
CLOSED
TCP closing a connection
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data
clientSocketclose()
client state server state
ESTABESTAB
Transport Layer
3-19Principles of congestion control
Transport Layer
3-20
congestion informally ldquotoo many sources sending too much data too
fast for network to handlerdquodifferent from flow controlmanifestationslost packets (buffer overflow at routers)long delays (queueing in router
buffers)a top-10 problem
Principles of congestion control
Transport Layer
3-21
Causescosts of congestion scenario 1 two senders two receiversone router infinite buffers output link capacity Rno retransmission
maximum per-connection throughput R2
unlimited shared output link buffers
Host A
original data lin
Host B
throughput lout
R2
R2
l out
lin R2de
lay
lin large delays as arrival
rate lin approaches capacity
Transport Layer
3-22
one router finite buffers sender retransmission of timed-out packet application-layer input = application-layer output lin = lout
transport-layer input includes retransmissions lin lin
finite shared output link buffers
Host A
lin original data
Host B
loutlin original data plus retransmitted data
Causescosts of congestion scenario 2
Transport Layer
3-23
idealization perfect knowledge
sender sends only when router buffers available
finite shared output link buffers
lin original dataloutlin original data plus
retransmitted data
copy
free buffer space
R2
R2
l out
lin
Causescosts of congestion scenario 2
Host B
A
Transport Layer
3-24
lin original dataloutlin original data plus
retransmitted data
copy
no buffer space
Idealization known loss packets can be lost dropped at router due to full buffers
sender only resends if packet known to be lost
Causescosts of congestion scenario 2
A
Host B
Transport Layer
3-25
lin original dataloutlin original data plus
retransmitted data
free buffer space
Causescosts of congestion scenario 2
Idealization known loss packets can be lost dropped at router due to full buffers
sender only resends if packet known to be lost
R2
R2lin
l out
when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)
A
Host B
Transport Layer
3-26
A
lin loutlincopy
free buffer space
timeout
R2
R2lin
l out
when sending at R2 some packets are retransmissions including duplicated that are delivered
Host B
Realistic duplicates packets can be lost
dropped at router due to full buffers
sender times out prematurely sending two copies both of which are delivered
Causescosts of congestion scenario 2
Transport Layer
3-27
R2
l out
when sending at R2 some packets are retransmissions including duplicated that are delivered
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple
copies of pkt decreasing goodput
R2lin
Causescosts of congestion scenario 2 Realistic duplicates packets can be lost
dropped at router due to full buffers
sender times out prematurely sending two copies both of which are delivered
Transport Layer
3-28
four sendersmultihop paths timeoutretransmit
Q what happens as lin and lin
rsquo increase
finite shared output link buffers
Host A lout
Causescosts of congestion scenario 3
Host B
Host CHost D
lin original data
lin original data plus retransmitted data
A as red linrsquo increases all
arriving blue pkts at upper queue are dropped blue throughput g 0
Transport Layer
3-29
another ldquocostrdquo of congestion when packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Causescosts of congestion scenario 3
C2
C2
l out
linrsquo
Transport Layer
3-30
Approaches towards congestion controltwo broad approaches towards congestion
controlend-end congestion
controlno explicit
feedback from network
congestion inferred from end-system observed loss delay
approach taken by TCP
network-assisted congestion control
routers provide feedback to end systemssingle bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
explicit rate for sender to send at
Transport Layer
3-31Case study ATM ABR congestion
controlABR available bit rateldquoelastic servicerdquo if senderrsquos path
ldquounderloadedrdquo sender should use
available bandwidth
if senderrsquos path congested sender throttled to
minimum guaranteed rate
RM (resource management) cellssent by sender interspersed
with data cellsbits in RM cell set by switches
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild
congestion) CI bit congestion indication
RM cells returned to sender by receiver with bits intact
Transport Layer
3-32Case study ATM ABR congestion control
two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path
EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in
returned RM cell
RM cell data cell
Transport Layer
3-33
TCP congestion control
Transport Layer
3-34
TCP congestion control additive increase multiplicative decrease approach sender increases transmission
rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1
MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half
after loss
cwnd
TCP
sen
der
cong
estio
n w
indo
w s
ize
AIMD saw toothbehavior probing
for bandwidth
additively increase window size helliphellip until loss occurs (then cut window in half)
time
Transport Layer
3-35TCP Congestion Control details
sender limits transmission
cwnd is dynamic function of perceived network congestion
TCP sending rate
roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte
ACKed sent not-yet ACKed(ldquoin-flightrdquo)
last byte sent
cwnd
LastByteSent-LastByteAcked
lt cwnd
sender sequence number space
rate~~cwndRTT bytessec
Transport Layer
3-36TCP Slow Start
when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for
every ACK received
summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
Transport Layer
3-37
TCP detecting reacting to loss
loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then
grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly
TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)
Transport Layer
3-38
Q when should the exponential increase switch to linear
A when cwnd gets to 12 of its value before timeout
Implementationvariable ssthresh on loss event ssthresh is
set to 12 of cwnd just before loss event
TCP switching from slow start to CA
Transport Layer
3-39
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer
3-40TCP throughput
avg TCP thruput as function of window size RTTignore slow start assume always data to
sendW window size (measured in bytes) where loss
occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW
W2
avg TCP thruput = 34WRTTbytessec
Transport Layer
3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want
10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L
[Mathis 1997]
to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate
new versions of TCP for high-speed
TCP throughput = 122 MSSRTT L
Transport Layer
3-42
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP Fairness
TCP connection 2
Transport Layer
3-43
Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout
increasesmultiplicative decrease decreases throughput
proportionally R
R
equal bandwidth share
Connection 1 throughputCon
nec t
ion
2 t h
roug
hput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer
3-44
Fairness (more)Fairness and UDP
multimedia apps often do not use TCP do not want rate throttled
by congestion control
instead use UDP send audiovideo at
constant rate tolerate packet loss
Fairness parallel TCP connections
application can open multiple parallel connections between two hosts
web browsers do this eg link of rate R with 9
existing connectionsnew app asks for 1 TCP gets
rate R10new app asks for 11 TCPs
gets R2
- Congestion in networks
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (2)
- TCP ACK generation [RFC 1122 RFC 2581]
- TCP fast retransmit
- TCP fast retransmit (2)
- TCP flow control
- TCP flow control (2)
- Connection Management
- Agreeing to establish a connection
- Agreeing to establish a connection (2)
- TCP 3-way handshake
- TCP 3-way handshake FSM
- TCP closing a connection
- TCP closing a connection (2)
- Principles of congestion control
- Principles of congestion control (2)
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 2 (3)
- Causescosts of congestion scenario 2 (4)
- Causescosts of congestion scenario 2 (5)
- Causescosts of congestion scenario 2 (6)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- Case study ATM ABR congestion control
- Case study ATM ABR congestion control (2)
- TCP congestion control
- TCP congestion control additive increase multiplicative decre
- TCP Congestion Control details
- TCP Slow Start
- TCP detecting reacting to loss
- TCP switching from slow start to CA
- Summary TCP Congestion Control
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- TCP Fairness
- Why is TCP fair
- Fairness (more)
-
Transport Layer
3-9
X
fast retransmit after sender receipt of triple duplicate ACK
Host BHost A
Seq=92 8 bytes of data
ACK=100
timeo
ut
ACK=100
ACK=100
ACK=100
TCP fast retransmit
Seq=100 20 bytes of data
Seq=100 20 bytes of data
Transport Layer
3-10
TCP flow controlapplicationprocess
TCP socketreceiver buffers
TCPcode
IPcode
applicationOS
receiver protocol stack
application may remove data from
TCP socket buffers hellip
hellip slower than TCP receiver is delivering(sender is sending)
from sender
receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast
flow control
Transport Layer
3-11
TCP flow control
buffered data
free buffer spacerwnd
RcvBuffer
TCP segment payloads
to application processreceiver ldquoadvertisesrdquo free
buffer space by including rwnd value in TCP header of receiver-to-sender segmentsRcvBuffer size set via socket
options (typical default is 4096 bytes)
many operating systems autoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value
guarantees receive buffer will not overflow
receiver-side buffering
Transport Layer
3-12Connection Managementbefore exchanging data senderreceiver
ldquohandshakerdquoagree to establish connection (each knowing the
other willing to establish connection)agree on connection parameters
connection state ESTABconnection variables
seq client-to-server server-to-clientrcvBuffer size at serverclient
application
network
connection state ESTABconnection Variables
seq client-to-server server-to-clientrcvBuffer size at serverclient
application
network
Socket clientSocket = newSocket(hostnameport
number)
Socket connectionSocket = welcomeSocketaccept()
Transport Layer
3-13
Q will 2-way handshake always work in network
variable delaysretransmitted messages (eg
req_conn(x)) due to message lossmessage reorderingcanrsquot ldquoseerdquo other side
2-way handshake
Letrsquos talkOK ESTAB
ESTAB
choose x req_conn(x)ESTAB
ESTABacc_conn(x)
Agreeing to establish a connection
Transport Layer
3-14
Agreeing to establish a connection
2-way handshake failure scenarios
retransmitreq_conn(
x)
ESTAB
req_conn(x)
half open connection(no client)
client terminat
esserverforgets x
connection x completes
retransmitreq_conn(
x)
ESTAB
req_conn(x)
data(x+1)
retransmitdata(x+1)
acceptdata(x+1)
choose x req_conn(x)ESTAB
ESTAB
acc_conn(x)
client terminat
es
ESTAB
choose x req_conn(x)ESTAB
acc_conn(x)
data(x+1) acceptdata(x+1)
connection x completes server
forgets x
Transport Layer
3-15TCP 3-way handshake
SYNbit=1 Seq=x
choose init seq num xsend TCP SYN msg
ESTAB
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
choose init seq num ysend TCP SYNACKmsg acking SYN
ACKbit=1 ACKnum=y+1
received SYNACK(x) indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data received ACK(y)
indicates client is live
SYNSENT
ESTAB
SYN RCVD
client state
LISTENserver state
LISTEN
Transport Layer
3-16
TCP 3-way handshake FSM
closed
L
listen
SYNrcvd
SYNsent
ESTAB
Socket clientSocket = newSocket(hostnameport
number)
SYN(seq=x)
Socket connectionSocket = welcomeSocketaccept()
SYN(x)SYNACK(seq=yACKnum=x+1)create new socket for communication back to client
SYNACK(seq=yACKnum=x+1)ACK(ACKnum=y+1)ACK(ACKnum=y+1)
L
Transport Layer
3-17TCP closing a connection
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer
3-18
FIN_WAIT_2
CLOSE_WAIT
FINbit=1 seq=y
ACKbit=1 ACKnum=y+1
ACKbit=1 ACKnum=x+1 wait for server
closecan stillsend data
can no longersend data
LAST_ACK
CLOSED
TIMED_WAIT
timed wait for 2max
segment lifetime
CLOSED
TCP closing a connection
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data
clientSocketclose()
client state server state
ESTABESTAB
Transport Layer
3-19Principles of congestion control
Transport Layer
3-20
congestion informally ldquotoo many sources sending too much data too
fast for network to handlerdquodifferent from flow controlmanifestationslost packets (buffer overflow at routers)long delays (queueing in router
buffers)a top-10 problem
Principles of congestion control
Transport Layer
3-21
Causescosts of congestion scenario 1 two senders two receiversone router infinite buffers output link capacity Rno retransmission
maximum per-connection throughput R2
unlimited shared output link buffers
Host A
original data lin
Host B
throughput lout
R2
R2
l out
lin R2de
lay
lin large delays as arrival
rate lin approaches capacity
Transport Layer
3-22
one router finite buffers sender retransmission of timed-out packet application-layer input = application-layer output lin = lout
transport-layer input includes retransmissions lin lin
finite shared output link buffers
Host A
lin original data
Host B
loutlin original data plus retransmitted data
Causescosts of congestion scenario 2
Transport Layer
3-23
idealization perfect knowledge
sender sends only when router buffers available
finite shared output link buffers
lin original dataloutlin original data plus
retransmitted data
copy
free buffer space
R2
R2
l out
lin
Causescosts of congestion scenario 2
Host B
A
Transport Layer
3-24
lin original dataloutlin original data plus
retransmitted data
copy
no buffer space
Idealization known loss packets can be lost dropped at router due to full buffers
sender only resends if packet known to be lost
Causescosts of congestion scenario 2
A
Host B
Transport Layer
3-25
lin original dataloutlin original data plus
retransmitted data
free buffer space
Causescosts of congestion scenario 2
Idealization known loss packets can be lost dropped at router due to full buffers
sender only resends if packet known to be lost
R2
R2lin
l out
when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)
A
Host B
Transport Layer
3-26
A
lin loutlincopy
free buffer space
timeout
R2
R2lin
l out
when sending at R2 some packets are retransmissions including duplicated that are delivered
Host B
Realistic duplicates packets can be lost
dropped at router due to full buffers
sender times out prematurely sending two copies both of which are delivered
Causescosts of congestion scenario 2
Transport Layer
3-27
R2
l out
when sending at R2 some packets are retransmissions including duplicated that are delivered
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple
copies of pkt decreasing goodput
R2lin
Causescosts of congestion scenario 2 Realistic duplicates packets can be lost
dropped at router due to full buffers
sender times out prematurely sending two copies both of which are delivered
Transport Layer
3-28
four sendersmultihop paths timeoutretransmit
Q what happens as lin and lin
rsquo increase
finite shared output link buffers
Host A lout
Causescosts of congestion scenario 3
Host B
Host CHost D
lin original data
lin original data plus retransmitted data
A as red linrsquo increases all
arriving blue pkts at upper queue are dropped blue throughput g 0
Transport Layer
3-29
another ldquocostrdquo of congestion when packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Causescosts of congestion scenario 3
C2
C2
l out
linrsquo
Transport Layer
3-30
Approaches towards congestion controltwo broad approaches towards congestion
controlend-end congestion
controlno explicit
feedback from network
congestion inferred from end-system observed loss delay
approach taken by TCP
network-assisted congestion control
routers provide feedback to end systemssingle bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
explicit rate for sender to send at
Transport Layer
3-31Case study ATM ABR congestion
controlABR available bit rateldquoelastic servicerdquo if senderrsquos path
ldquounderloadedrdquo sender should use
available bandwidth
if senderrsquos path congested sender throttled to
minimum guaranteed rate
RM (resource management) cellssent by sender interspersed
with data cellsbits in RM cell set by switches
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild
congestion) CI bit congestion indication
RM cells returned to sender by receiver with bits intact
Transport Layer
3-32Case study ATM ABR congestion control
two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path
EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in
returned RM cell
RM cell data cell
Transport Layer
3-33
TCP congestion control
Transport Layer
3-34
TCP congestion control additive increase multiplicative decrease approach sender increases transmission
rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1
MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half
after loss
cwnd
TCP
sen
der
cong
estio
n w
indo
w s
ize
AIMD saw toothbehavior probing
for bandwidth
additively increase window size helliphellip until loss occurs (then cut window in half)
time
Transport Layer
3-35TCP Congestion Control details
sender limits transmission
cwnd is dynamic function of perceived network congestion
TCP sending rate
roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte
ACKed sent not-yet ACKed(ldquoin-flightrdquo)
last byte sent
cwnd
LastByteSent-LastByteAcked
lt cwnd
sender sequence number space
rate~~cwndRTT bytessec
Transport Layer
3-36TCP Slow Start
when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for
every ACK received
summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
Transport Layer
3-37
TCP detecting reacting to loss
loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then
grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly
TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)
Transport Layer
3-38
Q when should the exponential increase switch to linear
A when cwnd gets to 12 of its value before timeout
Implementationvariable ssthresh on loss event ssthresh is
set to 12 of cwnd just before loss event
TCP switching from slow start to CA
Transport Layer
3-39
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer
3-40TCP throughput
avg TCP thruput as function of window size RTTignore slow start assume always data to
sendW window size (measured in bytes) where loss
occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW
W2
avg TCP thruput = 34WRTTbytessec
Transport Layer
3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want
10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L
[Mathis 1997]
to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate
new versions of TCP for high-speed
TCP throughput = 122 MSSRTT L
Transport Layer
3-42
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP Fairness
TCP connection 2
Transport Layer
3-43
Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout
increasesmultiplicative decrease decreases throughput
proportionally R
R
equal bandwidth share
Connection 1 throughputCon
nec t
ion
2 t h
roug
hput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer
3-44
Fairness (more)Fairness and UDP
multimedia apps often do not use TCP do not want rate throttled
by congestion control
instead use UDP send audiovideo at
constant rate tolerate packet loss
Fairness parallel TCP connections
application can open multiple parallel connections between two hosts
web browsers do this eg link of rate R with 9
existing connectionsnew app asks for 1 TCP gets
rate R10new app asks for 11 TCPs
gets R2
- Congestion in networks
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (2)
- TCP ACK generation [RFC 1122 RFC 2581]
- TCP fast retransmit
- TCP fast retransmit (2)
- TCP flow control
- TCP flow control (2)
- Connection Management
- Agreeing to establish a connection
- Agreeing to establish a connection (2)
- TCP 3-way handshake
- TCP 3-way handshake FSM
- TCP closing a connection
- TCP closing a connection (2)
- Principles of congestion control
- Principles of congestion control (2)
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 2 (3)
- Causescosts of congestion scenario 2 (4)
- Causescosts of congestion scenario 2 (5)
- Causescosts of congestion scenario 2 (6)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- Case study ATM ABR congestion control
- Case study ATM ABR congestion control (2)
- TCP congestion control
- TCP congestion control additive increase multiplicative decre
- TCP Congestion Control details
- TCP Slow Start
- TCP detecting reacting to loss
- TCP switching from slow start to CA
- Summary TCP Congestion Control
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- TCP Fairness
- Why is TCP fair
- Fairness (more)
-
Transport Layer
3-10
TCP flow controlapplicationprocess
TCP socketreceiver buffers
TCPcode
IPcode
applicationOS
receiver protocol stack
application may remove data from
TCP socket buffers hellip
hellip slower than TCP receiver is delivering(sender is sending)
from sender
receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast
flow control
Transport Layer
3-11
TCP flow control
buffered data
free buffer spacerwnd
RcvBuffer
TCP segment payloads
to application processreceiver ldquoadvertisesrdquo free
buffer space by including rwnd value in TCP header of receiver-to-sender segmentsRcvBuffer size set via socket
options (typical default is 4096 bytes)
many operating systems autoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value
guarantees receive buffer will not overflow
receiver-side buffering
Transport Layer
3-12Connection Managementbefore exchanging data senderreceiver
ldquohandshakerdquoagree to establish connection (each knowing the
other willing to establish connection)agree on connection parameters
connection state ESTABconnection variables
seq client-to-server server-to-clientrcvBuffer size at serverclient
application
network
connection state ESTABconnection Variables
seq client-to-server server-to-clientrcvBuffer size at serverclient
application
network
Socket clientSocket = newSocket(hostnameport
number)
Socket connectionSocket = welcomeSocketaccept()
Transport Layer
3-13
Q will 2-way handshake always work in network
variable delaysretransmitted messages (eg
req_conn(x)) due to message lossmessage reorderingcanrsquot ldquoseerdquo other side
2-way handshake
Letrsquos talkOK ESTAB
ESTAB
choose x req_conn(x)ESTAB
ESTABacc_conn(x)
Agreeing to establish a connection
Transport Layer
3-14
Agreeing to establish a connection
2-way handshake failure scenarios
retransmitreq_conn(
x)
ESTAB
req_conn(x)
half open connection(no client)
client terminat
esserverforgets x
connection x completes
retransmitreq_conn(
x)
ESTAB
req_conn(x)
data(x+1)
retransmitdata(x+1)
acceptdata(x+1)
choose x req_conn(x)ESTAB
ESTAB
acc_conn(x)
client terminat
es
ESTAB
choose x req_conn(x)ESTAB
acc_conn(x)
data(x+1) acceptdata(x+1)
connection x completes server
forgets x
Transport Layer
3-15TCP 3-way handshake
SYNbit=1 Seq=x
choose init seq num xsend TCP SYN msg
ESTAB
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
choose init seq num ysend TCP SYNACKmsg acking SYN
ACKbit=1 ACKnum=y+1
received SYNACK(x) indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data received ACK(y)
indicates client is live
SYNSENT
ESTAB
SYN RCVD
client state
LISTENserver state
LISTEN
Transport Layer
3-16
TCP 3-way handshake FSM
closed
L
listen
SYNrcvd
SYNsent
ESTAB
Socket clientSocket = newSocket(hostnameport
number)
SYN(seq=x)
Socket connectionSocket = welcomeSocketaccept()
SYN(x)SYNACK(seq=yACKnum=x+1)create new socket for communication back to client
SYNACK(seq=yACKnum=x+1)ACK(ACKnum=y+1)ACK(ACKnum=y+1)
L
Transport Layer
3-17TCP closing a connection
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer
3-18
FIN_WAIT_2
CLOSE_WAIT
FINbit=1 seq=y
ACKbit=1 ACKnum=y+1
ACKbit=1 ACKnum=x+1 wait for server
closecan stillsend data
can no longersend data
LAST_ACK
CLOSED
TIMED_WAIT
timed wait for 2max
segment lifetime
CLOSED
TCP closing a connection
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data
clientSocketclose()
client state server state
ESTABESTAB
Transport Layer
3-19Principles of congestion control
Transport Layer
3-20
congestion informally ldquotoo many sources sending too much data too
fast for network to handlerdquodifferent from flow controlmanifestationslost packets (buffer overflow at routers)long delays (queueing in router
buffers)a top-10 problem
Principles of congestion control
Transport Layer
3-21
Causescosts of congestion scenario 1 two senders two receiversone router infinite buffers output link capacity Rno retransmission
maximum per-connection throughput R2
unlimited shared output link buffers
Host A
original data lin
Host B
throughput lout
R2
R2
l out
lin R2de
lay
lin large delays as arrival
rate lin approaches capacity
Transport Layer
3-22
one router finite buffers sender retransmission of timed-out packet application-layer input = application-layer output lin = lout
transport-layer input includes retransmissions lin lin
finite shared output link buffers
Host A
lin original data
Host B
loutlin original data plus retransmitted data
Causescosts of congestion scenario 2
Transport Layer
3-23
idealization perfect knowledge
sender sends only when router buffers available
finite shared output link buffers
lin original dataloutlin original data plus
retransmitted data
copy
free buffer space
R2
R2
l out
lin
Causescosts of congestion scenario 2
Host B
A
Transport Layer
3-24
lin original dataloutlin original data plus
retransmitted data
copy
no buffer space
Idealization known loss packets can be lost dropped at router due to full buffers
sender only resends if packet known to be lost
Causescosts of congestion scenario 2
A
Host B
Transport Layer
3-25
lin original dataloutlin original data plus
retransmitted data
free buffer space
Causescosts of congestion scenario 2
Idealization known loss packets can be lost dropped at router due to full buffers
sender only resends if packet known to be lost
R2
R2lin
l out
when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)
A
Host B
Transport Layer
3-26
A
lin loutlincopy
free buffer space
timeout
R2
R2lin
l out
when sending at R2 some packets are retransmissions including duplicated that are delivered
Host B
Realistic duplicates packets can be lost
dropped at router due to full buffers
sender times out prematurely sending two copies both of which are delivered
Causescosts of congestion scenario 2
Transport Layer
3-27
R2
l out
when sending at R2 some packets are retransmissions including duplicated that are delivered
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple
copies of pkt decreasing goodput
R2lin
Causescosts of congestion scenario 2 Realistic duplicates packets can be lost
dropped at router due to full buffers
sender times out prematurely sending two copies both of which are delivered
Transport Layer
3-28
four sendersmultihop paths timeoutretransmit
Q what happens as lin and lin
rsquo increase
finite shared output link buffers
Host A lout
Causescosts of congestion scenario 3
Host B
Host CHost D
lin original data
lin original data plus retransmitted data
A as red linrsquo increases all
arriving blue pkts at upper queue are dropped blue throughput g 0
Transport Layer
3-29
another ldquocostrdquo of congestion when packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Causescosts of congestion scenario 3
C2
C2
l out
linrsquo
Transport Layer
3-30
Approaches towards congestion controltwo broad approaches towards congestion
controlend-end congestion
controlno explicit
feedback from network
congestion inferred from end-system observed loss delay
approach taken by TCP
network-assisted congestion control
routers provide feedback to end systemssingle bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
explicit rate for sender to send at
Transport Layer
3-31Case study ATM ABR congestion
controlABR available bit rateldquoelastic servicerdquo if senderrsquos path
ldquounderloadedrdquo sender should use
available bandwidth
if senderrsquos path congested sender throttled to
minimum guaranteed rate
RM (resource management) cellssent by sender interspersed
with data cellsbits in RM cell set by switches
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild
congestion) CI bit congestion indication
RM cells returned to sender by receiver with bits intact
Transport Layer
3-32Case study ATM ABR congestion control
two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path
EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in
returned RM cell
RM cell data cell
Transport Layer
3-33
TCP congestion control
Transport Layer
3-34
TCP congestion control additive increase multiplicative decrease approach sender increases transmission
rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1
MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half
after loss
cwnd
TCP
sen
der
cong
estio
n w
indo
w s
ize
AIMD saw toothbehavior probing
for bandwidth
additively increase window size helliphellip until loss occurs (then cut window in half)
time
Transport Layer
3-35TCP Congestion Control details
sender limits transmission
cwnd is dynamic function of perceived network congestion
TCP sending rate
roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte
ACKed sent not-yet ACKed(ldquoin-flightrdquo)
last byte sent
cwnd
LastByteSent-LastByteAcked
lt cwnd
sender sequence number space
rate~~cwndRTT bytessec
Transport Layer
3-36TCP Slow Start
when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for
every ACK received
summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
Transport Layer
3-37
TCP detecting reacting to loss
loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then
grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly
TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)
Transport Layer
3-38
Q when should the exponential increase switch to linear
A when cwnd gets to 12 of its value before timeout
Implementationvariable ssthresh on loss event ssthresh is
set to 12 of cwnd just before loss event
TCP switching from slow start to CA
Transport Layer
3-39
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer
3-40TCP throughput
avg TCP thruput as function of window size RTTignore slow start assume always data to
sendW window size (measured in bytes) where loss
occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW
W2
avg TCP thruput = 34WRTTbytessec
Transport Layer
3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want
10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L
[Mathis 1997]
to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate
new versions of TCP for high-speed
TCP throughput = 122 MSSRTT L
Transport Layer
3-42
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP Fairness
TCP connection 2
Transport Layer
3-43
Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout
increasesmultiplicative decrease decreases throughput
proportionally R
R
equal bandwidth share
Connection 1 throughputCon
nec t
ion
2 t h
roug
hput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer
3-44
Fairness (more)Fairness and UDP
multimedia apps often do not use TCP do not want rate throttled
by congestion control
instead use UDP send audiovideo at
constant rate tolerate packet loss
Fairness parallel TCP connections
application can open multiple parallel connections between two hosts
web browsers do this eg link of rate R with 9
existing connectionsnew app asks for 1 TCP gets
rate R10new app asks for 11 TCPs
gets R2
- Congestion in networks
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (2)
- TCP ACK generation [RFC 1122 RFC 2581]
- TCP fast retransmit
- TCP fast retransmit (2)
- TCP flow control
- TCP flow control (2)
- Connection Management
- Agreeing to establish a connection
- Agreeing to establish a connection (2)
- TCP 3-way handshake
- TCP 3-way handshake FSM
- TCP closing a connection
- TCP closing a connection (2)
- Principles of congestion control
- Principles of congestion control (2)
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 2 (3)
- Causescosts of congestion scenario 2 (4)
- Causescosts of congestion scenario 2 (5)
- Causescosts of congestion scenario 2 (6)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- Case study ATM ABR congestion control
- Case study ATM ABR congestion control (2)
- TCP congestion control
- TCP congestion control additive increase multiplicative decre
- TCP Congestion Control details
- TCP Slow Start
- TCP detecting reacting to loss
- TCP switching from slow start to CA
- Summary TCP Congestion Control
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- TCP Fairness
- Why is TCP fair
- Fairness (more)
-
Transport Layer
3-11
TCP flow control
buffered data
free buffer spacerwnd
RcvBuffer
TCP segment payloads
to application processreceiver ldquoadvertisesrdquo free
buffer space by including rwnd value in TCP header of receiver-to-sender segmentsRcvBuffer size set via socket
options (typical default is 4096 bytes)
many operating systems autoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value
guarantees receive buffer will not overflow
receiver-side buffering
Transport Layer
3-12Connection Managementbefore exchanging data senderreceiver
ldquohandshakerdquoagree to establish connection (each knowing the
other willing to establish connection)agree on connection parameters
connection state ESTABconnection variables
seq client-to-server server-to-clientrcvBuffer size at serverclient
application
network
connection state ESTABconnection Variables
seq client-to-server server-to-clientrcvBuffer size at serverclient
application
network
Socket clientSocket = newSocket(hostnameport
number)
Socket connectionSocket = welcomeSocketaccept()
Transport Layer
3-13
Q will 2-way handshake always work in network
variable delaysretransmitted messages (eg
req_conn(x)) due to message lossmessage reorderingcanrsquot ldquoseerdquo other side
2-way handshake
Letrsquos talkOK ESTAB
ESTAB
choose x req_conn(x)ESTAB
ESTABacc_conn(x)
Agreeing to establish a connection
Transport Layer
3-14
Agreeing to establish a connection
2-way handshake failure scenarios
retransmitreq_conn(
x)
ESTAB
req_conn(x)
half open connection(no client)
client terminat
esserverforgets x
connection x completes
retransmitreq_conn(
x)
ESTAB
req_conn(x)
data(x+1)
retransmitdata(x+1)
acceptdata(x+1)
choose x req_conn(x)ESTAB
ESTAB
acc_conn(x)
client terminat
es
ESTAB
choose x req_conn(x)ESTAB
acc_conn(x)
data(x+1) acceptdata(x+1)
connection x completes server
forgets x
Transport Layer
3-15TCP 3-way handshake
SYNbit=1 Seq=x
choose init seq num xsend TCP SYN msg
ESTAB
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
choose init seq num ysend TCP SYNACKmsg acking SYN
ACKbit=1 ACKnum=y+1
received SYNACK(x) indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data received ACK(y)
indicates client is live
SYNSENT
ESTAB
SYN RCVD
client state
LISTENserver state
LISTEN
Transport Layer
3-16
TCP 3-way handshake FSM
closed
L
listen
SYNrcvd
SYNsent
ESTAB
Socket clientSocket = newSocket(hostnameport
number)
SYN(seq=x)
Socket connectionSocket = welcomeSocketaccept()
SYN(x)SYNACK(seq=yACKnum=x+1)create new socket for communication back to client
SYNACK(seq=yACKnum=x+1)ACK(ACKnum=y+1)ACK(ACKnum=y+1)
L
Transport Layer
3-17TCP closing a connection
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer
3-18
FIN_WAIT_2
CLOSE_WAIT
FINbit=1 seq=y
ACKbit=1 ACKnum=y+1
ACKbit=1 ACKnum=x+1 wait for server
closecan stillsend data
can no longersend data
LAST_ACK
CLOSED
TIMED_WAIT
timed wait for 2max
segment lifetime
CLOSED
TCP closing a connection
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data
clientSocketclose()
client state server state
ESTABESTAB
Transport Layer
3-19Principles of congestion control
Transport Layer
3-20
congestion informally ldquotoo many sources sending too much data too
fast for network to handlerdquodifferent from flow controlmanifestationslost packets (buffer overflow at routers)long delays (queueing in router
buffers)a top-10 problem
Principles of congestion control
Transport Layer
3-21
Causescosts of congestion scenario 1 two senders two receiversone router infinite buffers output link capacity Rno retransmission
maximum per-connection throughput R2
unlimited shared output link buffers
Host A
original data lin
Host B
throughput lout
R2
R2
l out
lin R2de
lay
lin large delays as arrival
rate lin approaches capacity
Transport Layer
3-22
one router finite buffers sender retransmission of timed-out packet application-layer input = application-layer output lin = lout
transport-layer input includes retransmissions lin lin
finite shared output link buffers
Host A
lin original data
Host B
loutlin original data plus retransmitted data
Causescosts of congestion scenario 2
Transport Layer
3-23
idealization perfect knowledge
sender sends only when router buffers available
finite shared output link buffers
lin original dataloutlin original data plus
retransmitted data
copy
free buffer space
R2
R2
l out
lin
Causescosts of congestion scenario 2
Host B
A
Transport Layer
3-24
lin original dataloutlin original data plus
retransmitted data
copy
no buffer space
Idealization known loss packets can be lost dropped at router due to full buffers
sender only resends if packet known to be lost
Causescosts of congestion scenario 2
A
Host B
Transport Layer
3-25
lin original dataloutlin original data plus
retransmitted data
free buffer space
Causescosts of congestion scenario 2
Idealization known loss packets can be lost dropped at router due to full buffers
sender only resends if packet known to be lost
R2
R2lin
l out
when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)
A
Host B
Transport Layer
3-26
A
lin loutlincopy
free buffer space
timeout
R2
R2lin
l out
when sending at R2 some packets are retransmissions including duplicated that are delivered
Host B
Realistic duplicates packets can be lost
dropped at router due to full buffers
sender times out prematurely sending two copies both of which are delivered
Causescosts of congestion scenario 2
Transport Layer
3-27
R2
l out
when sending at R2 some packets are retransmissions including duplicated that are delivered
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple
copies of pkt decreasing goodput
R2lin
Causescosts of congestion scenario 2 Realistic duplicates packets can be lost
dropped at router due to full buffers
sender times out prematurely sending two copies both of which are delivered
Transport Layer
3-28
four sendersmultihop paths timeoutretransmit
Q what happens as lin and lin
rsquo increase
finite shared output link buffers
Host A lout
Causescosts of congestion scenario 3
Host B
Host CHost D
lin original data
lin original data plus retransmitted data
A as red linrsquo increases all
arriving blue pkts at upper queue are dropped blue throughput g 0
Transport Layer
3-29
another ldquocostrdquo of congestion when packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Causescosts of congestion scenario 3
C2
C2
l out
linrsquo
Transport Layer
3-30
Approaches towards congestion controltwo broad approaches towards congestion
controlend-end congestion
controlno explicit
feedback from network
congestion inferred from end-system observed loss delay
approach taken by TCP
network-assisted congestion control
routers provide feedback to end systemssingle bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
explicit rate for sender to send at
Transport Layer
3-31Case study ATM ABR congestion
controlABR available bit rateldquoelastic servicerdquo if senderrsquos path
ldquounderloadedrdquo sender should use
available bandwidth
if senderrsquos path congested sender throttled to
minimum guaranteed rate
RM (resource management) cellssent by sender interspersed
with data cellsbits in RM cell set by switches
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild
congestion) CI bit congestion indication
RM cells returned to sender by receiver with bits intact
Transport Layer
3-32Case study ATM ABR congestion control
two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path
EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in
returned RM cell
RM cell data cell
Transport Layer
3-33
TCP congestion control
Transport Layer
3-34
TCP congestion control additive increase multiplicative decrease approach sender increases transmission
rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1
MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half
after loss
cwnd
TCP
sen
der
cong
estio
n w
indo
w s
ize
AIMD saw toothbehavior probing
for bandwidth
additively increase window size helliphellip until loss occurs (then cut window in half)
time
Transport Layer
3-35TCP Congestion Control details
sender limits transmission
cwnd is dynamic function of perceived network congestion
TCP sending rate
roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte
ACKed sent not-yet ACKed(ldquoin-flightrdquo)
last byte sent
cwnd
LastByteSent-LastByteAcked
lt cwnd
sender sequence number space
rate~~cwndRTT bytessec
Transport Layer
3-36TCP Slow Start
when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for
every ACK received
summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
Transport Layer
3-37
TCP detecting reacting to loss
loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then
grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly
TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)
Transport Layer
3-38
Q when should the exponential increase switch to linear
A when cwnd gets to 12 of its value before timeout
Implementationvariable ssthresh on loss event ssthresh is
set to 12 of cwnd just before loss event
TCP switching from slow start to CA
Transport Layer
3-39
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer
3-40TCP throughput
avg TCP thruput as function of window size RTTignore slow start assume always data to
sendW window size (measured in bytes) where loss
occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW
W2
avg TCP thruput = 34WRTTbytessec
Transport Layer
3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want
10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L
[Mathis 1997]
to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate
new versions of TCP for high-speed
TCP throughput = 122 MSSRTT L
Transport Layer
3-42
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP Fairness
TCP connection 2
Transport Layer
3-43
Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout
increasesmultiplicative decrease decreases throughput
proportionally R
R
equal bandwidth share
Connection 1 throughputCon
nec t
ion
2 t h
roug
hput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer
3-44
Fairness (more)Fairness and UDP
multimedia apps often do not use TCP do not want rate throttled
by congestion control
instead use UDP send audiovideo at
constant rate tolerate packet loss
Fairness parallel TCP connections
application can open multiple parallel connections between two hosts
web browsers do this eg link of rate R with 9
existing connectionsnew app asks for 1 TCP gets
rate R10new app asks for 11 TCPs
gets R2
- Congestion in networks
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (2)
- TCP ACK generation [RFC 1122 RFC 2581]
- TCP fast retransmit
- TCP fast retransmit (2)
- TCP flow control
- TCP flow control (2)
- Connection Management
- Agreeing to establish a connection
- Agreeing to establish a connection (2)
- TCP 3-way handshake
- TCP 3-way handshake FSM
- TCP closing a connection
- TCP closing a connection (2)
- Principles of congestion control
- Principles of congestion control (2)
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 2 (3)
- Causescosts of congestion scenario 2 (4)
- Causescosts of congestion scenario 2 (5)
- Causescosts of congestion scenario 2 (6)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- Case study ATM ABR congestion control
- Case study ATM ABR congestion control (2)
- TCP congestion control
- TCP congestion control additive increase multiplicative decre
- TCP Congestion Control details
- TCP Slow Start
- TCP detecting reacting to loss
- TCP switching from slow start to CA
- Summary TCP Congestion Control
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- TCP Fairness
- Why is TCP fair
- Fairness (more)
-
Transport Layer
3-12Connection Managementbefore exchanging data senderreceiver
ldquohandshakerdquoagree to establish connection (each knowing the
other willing to establish connection)agree on connection parameters
connection state ESTABconnection variables
seq client-to-server server-to-clientrcvBuffer size at serverclient
application
network
connection state ESTABconnection Variables
seq client-to-server server-to-clientrcvBuffer size at serverclient
application
network
Socket clientSocket = newSocket(hostnameport
number)
Socket connectionSocket = welcomeSocketaccept()
Transport Layer
3-13
Q will 2-way handshake always work in network
variable delaysretransmitted messages (eg
req_conn(x)) due to message lossmessage reorderingcanrsquot ldquoseerdquo other side
2-way handshake
Letrsquos talkOK ESTAB
ESTAB
choose x req_conn(x)ESTAB
ESTABacc_conn(x)
Agreeing to establish a connection
Transport Layer
3-14
Agreeing to establish a connection
2-way handshake failure scenarios
retransmitreq_conn(
x)
ESTAB
req_conn(x)
half open connection(no client)
client terminat
esserverforgets x
connection x completes
retransmitreq_conn(
x)
ESTAB
req_conn(x)
data(x+1)
retransmitdata(x+1)
acceptdata(x+1)
choose x req_conn(x)ESTAB
ESTAB
acc_conn(x)
client terminat
es
ESTAB
choose x req_conn(x)ESTAB
acc_conn(x)
data(x+1) acceptdata(x+1)
connection x completes server
forgets x
Transport Layer
3-15TCP 3-way handshake
SYNbit=1 Seq=x
choose init seq num xsend TCP SYN msg
ESTAB
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
choose init seq num ysend TCP SYNACKmsg acking SYN
ACKbit=1 ACKnum=y+1
received SYNACK(x) indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data received ACK(y)
indicates client is live
SYNSENT
ESTAB
SYN RCVD
client state
LISTENserver state
LISTEN
Transport Layer
3-16
TCP 3-way handshake FSM
closed
L
listen
SYNrcvd
SYNsent
ESTAB
Socket clientSocket = newSocket(hostnameport
number)
SYN(seq=x)
Socket connectionSocket = welcomeSocketaccept()
SYN(x)SYNACK(seq=yACKnum=x+1)create new socket for communication back to client
SYNACK(seq=yACKnum=x+1)ACK(ACKnum=y+1)ACK(ACKnum=y+1)
L
Transport Layer
3-17TCP closing a connection
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer
3-18
FIN_WAIT_2
CLOSE_WAIT
FINbit=1 seq=y
ACKbit=1 ACKnum=y+1
ACKbit=1 ACKnum=x+1 wait for server
closecan stillsend data
can no longersend data
LAST_ACK
CLOSED
TIMED_WAIT
timed wait for 2max
segment lifetime
CLOSED
TCP closing a connection
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data
clientSocketclose()
client state server state
ESTABESTAB
Transport Layer
3-19Principles of congestion control
Transport Layer
3-20
congestion informally ldquotoo many sources sending too much data too
fast for network to handlerdquodifferent from flow controlmanifestationslost packets (buffer overflow at routers)long delays (queueing in router
buffers)a top-10 problem
Principles of congestion control
Transport Layer
3-21
Causescosts of congestion scenario 1 two senders two receiversone router infinite buffers output link capacity Rno retransmission
maximum per-connection throughput R2
unlimited shared output link buffers
Host A
original data lin
Host B
throughput lout
R2
R2
l out
lin R2de
lay
lin large delays as arrival
rate lin approaches capacity
Transport Layer
3-22
one router finite buffers sender retransmission of timed-out packet application-layer input = application-layer output lin = lout
transport-layer input includes retransmissions lin lin
finite shared output link buffers
Host A
lin original data
Host B
loutlin original data plus retransmitted data
Causescosts of congestion scenario 2
Transport Layer
3-23
idealization perfect knowledge
sender sends only when router buffers available
finite shared output link buffers
lin original dataloutlin original data plus
retransmitted data
copy
free buffer space
R2
R2
l out
lin
Causescosts of congestion scenario 2
Host B
A
Transport Layer
3-24
lin original dataloutlin original data plus
retransmitted data
copy
no buffer space
Idealization known loss packets can be lost dropped at router due to full buffers
sender only resends if packet known to be lost
Causescosts of congestion scenario 2
A
Host B
Transport Layer
3-25
lin original dataloutlin original data plus
retransmitted data
free buffer space
Causescosts of congestion scenario 2
Idealization known loss packets can be lost dropped at router due to full buffers
sender only resends if packet known to be lost
R2
R2lin
l out
when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)
A
Host B
Transport Layer
3-26
A
lin loutlincopy
free buffer space
timeout
R2
R2lin
l out
when sending at R2 some packets are retransmissions including duplicated that are delivered
Host B
Realistic duplicates packets can be lost
dropped at router due to full buffers
sender times out prematurely sending two copies both of which are delivered
Causescosts of congestion scenario 2
Transport Layer
3-27
R2
l out
when sending at R2 some packets are retransmissions including duplicated that are delivered
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple
copies of pkt decreasing goodput
R2lin
Causescosts of congestion scenario 2 Realistic duplicates packets can be lost
dropped at router due to full buffers
sender times out prematurely sending two copies both of which are delivered
Transport Layer
3-28
four sendersmultihop paths timeoutretransmit
Q what happens as lin and lin
rsquo increase
finite shared output link buffers
Host A lout
Causescosts of congestion scenario 3
Host B
Host CHost D
lin original data
lin original data plus retransmitted data
A as red linrsquo increases all
arriving blue pkts at upper queue are dropped blue throughput g 0
Transport Layer
3-29
another ldquocostrdquo of congestion when packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Causescosts of congestion scenario 3
C2
C2
l out
linrsquo
Transport Layer
3-30
Approaches towards congestion controltwo broad approaches towards congestion
controlend-end congestion
controlno explicit
feedback from network
congestion inferred from end-system observed loss delay
approach taken by TCP
network-assisted congestion control
routers provide feedback to end systemssingle bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
explicit rate for sender to send at
Transport Layer
3-31Case study ATM ABR congestion
controlABR available bit rateldquoelastic servicerdquo if senderrsquos path
ldquounderloadedrdquo sender should use
available bandwidth
if senderrsquos path congested sender throttled to
minimum guaranteed rate
RM (resource management) cellssent by sender interspersed
with data cellsbits in RM cell set by switches
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild
congestion) CI bit congestion indication
RM cells returned to sender by receiver with bits intact
Transport Layer
3-32Case study ATM ABR congestion control
two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path
EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in
returned RM cell
RM cell data cell
Transport Layer
3-33
TCP congestion control
Transport Layer
3-34
TCP congestion control additive increase multiplicative decrease approach sender increases transmission
rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1
MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half
after loss
cwnd
TCP
sen
der
cong
estio
n w
indo
w s
ize
AIMD saw toothbehavior probing
for bandwidth
additively increase window size helliphellip until loss occurs (then cut window in half)
time
Transport Layer
3-35TCP Congestion Control details
sender limits transmission
cwnd is dynamic function of perceived network congestion
TCP sending rate
roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte
ACKed sent not-yet ACKed(ldquoin-flightrdquo)
last byte sent
cwnd
LastByteSent-LastByteAcked
lt cwnd
sender sequence number space
rate~~cwndRTT bytessec
Transport Layer
3-36TCP Slow Start
when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for
every ACK received
summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
Transport Layer
3-37
TCP detecting reacting to loss
loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then
grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly
TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)
Transport Layer
3-38
Q when should the exponential increase switch to linear
A when cwnd gets to 12 of its value before timeout
Implementationvariable ssthresh on loss event ssthresh is
set to 12 of cwnd just before loss event
TCP switching from slow start to CA
Transport Layer
3-39
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer
3-40TCP throughput
avg TCP thruput as function of window size RTTignore slow start assume always data to
sendW window size (measured in bytes) where loss
occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW
W2
avg TCP thruput = 34WRTTbytessec
Transport Layer
3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want
10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L
[Mathis 1997]
to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate
new versions of TCP for high-speed
TCP throughput = 122 MSSRTT L
Transport Layer
3-42
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP Fairness
TCP connection 2
Transport Layer
3-43
Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout
increasesmultiplicative decrease decreases throughput
proportionally R
R
equal bandwidth share
Connection 1 throughputCon
nec t
ion
2 t h
roug
hput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer
3-44
Fairness (more)Fairness and UDP
multimedia apps often do not use TCP do not want rate throttled
by congestion control
instead use UDP send audiovideo at
constant rate tolerate packet loss
Fairness parallel TCP connections
application can open multiple parallel connections between two hosts
web browsers do this eg link of rate R with 9
existing connectionsnew app asks for 1 TCP gets
rate R10new app asks for 11 TCPs
gets R2
- Congestion in networks
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (2)
- TCP ACK generation [RFC 1122 RFC 2581]
- TCP fast retransmit
- TCP fast retransmit (2)
- TCP flow control
- TCP flow control (2)
- Connection Management
- Agreeing to establish a connection
- Agreeing to establish a connection (2)
- TCP 3-way handshake
- TCP 3-way handshake FSM
- TCP closing a connection
- TCP closing a connection (2)
- Principles of congestion control
- Principles of congestion control (2)
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 2 (3)
- Causescosts of congestion scenario 2 (4)
- Causescosts of congestion scenario 2 (5)
- Causescosts of congestion scenario 2 (6)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- Case study ATM ABR congestion control
- Case study ATM ABR congestion control (2)
- TCP congestion control
- TCP congestion control additive increase multiplicative decre
- TCP Congestion Control details
- TCP Slow Start
- TCP detecting reacting to loss
- TCP switching from slow start to CA
- Summary TCP Congestion Control
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- TCP Fairness
- Why is TCP fair
- Fairness (more)
-
Transport Layer
3-13
Q will 2-way handshake always work in network
variable delaysretransmitted messages (eg
req_conn(x)) due to message lossmessage reorderingcanrsquot ldquoseerdquo other side
2-way handshake
Letrsquos talkOK ESTAB
ESTAB
choose x req_conn(x)ESTAB
ESTABacc_conn(x)
Agreeing to establish a connection
Transport Layer
3-14
Agreeing to establish a connection
2-way handshake failure scenarios
retransmitreq_conn(
x)
ESTAB
req_conn(x)
half open connection(no client)
client terminat
esserverforgets x
connection x completes
retransmitreq_conn(
x)
ESTAB
req_conn(x)
data(x+1)
retransmitdata(x+1)
acceptdata(x+1)
choose x req_conn(x)ESTAB
ESTAB
acc_conn(x)
client terminat
es
ESTAB
choose x req_conn(x)ESTAB
acc_conn(x)
data(x+1) acceptdata(x+1)
connection x completes server
forgets x
Transport Layer
3-15TCP 3-way handshake
SYNbit=1 Seq=x
choose init seq num xsend TCP SYN msg
ESTAB
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
choose init seq num ysend TCP SYNACKmsg acking SYN
ACKbit=1 ACKnum=y+1
received SYNACK(x) indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data received ACK(y)
indicates client is live
SYNSENT
ESTAB
SYN RCVD
client state
LISTENserver state
LISTEN
Transport Layer
3-16
TCP 3-way handshake FSM
closed
L
listen
SYNrcvd
SYNsent
ESTAB
Socket clientSocket = newSocket(hostnameport
number)
SYN(seq=x)
Socket connectionSocket = welcomeSocketaccept()
SYN(x)SYNACK(seq=yACKnum=x+1)create new socket for communication back to client
SYNACK(seq=yACKnum=x+1)ACK(ACKnum=y+1)ACK(ACKnum=y+1)
L
Transport Layer
3-17TCP closing a connection
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer
3-18
FIN_WAIT_2
CLOSE_WAIT
FINbit=1 seq=y
ACKbit=1 ACKnum=y+1
ACKbit=1 ACKnum=x+1 wait for server
closecan stillsend data
can no longersend data
LAST_ACK
CLOSED
TIMED_WAIT
timed wait for 2max
segment lifetime
CLOSED
TCP closing a connection
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data
clientSocketclose()
client state server state
ESTABESTAB
Transport Layer
3-19Principles of congestion control
Transport Layer
3-20
congestion informally ldquotoo many sources sending too much data too
fast for network to handlerdquodifferent from flow controlmanifestationslost packets (buffer overflow at routers)long delays (queueing in router
buffers)a top-10 problem
Principles of congestion control
Transport Layer
3-21
Causescosts of congestion scenario 1 two senders two receiversone router infinite buffers output link capacity Rno retransmission
maximum per-connection throughput R2
unlimited shared output link buffers
Host A
original data lin
Host B
throughput lout
R2
R2
l out
lin R2de
lay
lin large delays as arrival
rate lin approaches capacity
Transport Layer
3-22
one router finite buffers sender retransmission of timed-out packet application-layer input = application-layer output lin = lout
transport-layer input includes retransmissions lin lin
finite shared output link buffers
Host A
lin original data
Host B
loutlin original data plus retransmitted data
Causescosts of congestion scenario 2
Transport Layer
3-23
idealization perfect knowledge
sender sends only when router buffers available
finite shared output link buffers
lin original dataloutlin original data plus
retransmitted data
copy
free buffer space
R2
R2
l out
lin
Causescosts of congestion scenario 2
Host B
A
Transport Layer
3-24
lin original dataloutlin original data plus
retransmitted data
copy
no buffer space
Idealization known loss packets can be lost dropped at router due to full buffers
sender only resends if packet known to be lost
Causescosts of congestion scenario 2
A
Host B
Transport Layer
3-25
lin original dataloutlin original data plus
retransmitted data
free buffer space
Causescosts of congestion scenario 2
Idealization known loss packets can be lost dropped at router due to full buffers
sender only resends if packet known to be lost
R2
R2lin
l out
when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)
A
Host B
Transport Layer
3-26
A
lin loutlincopy
free buffer space
timeout
R2
R2lin
l out
when sending at R2 some packets are retransmissions including duplicated that are delivered
Host B
Realistic duplicates packets can be lost
dropped at router due to full buffers
sender times out prematurely sending two copies both of which are delivered
Causescosts of congestion scenario 2
Transport Layer
3-27
R2
l out
when sending at R2 some packets are retransmissions including duplicated that are delivered
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple
copies of pkt decreasing goodput
R2lin
Causescosts of congestion scenario 2 Realistic duplicates packets can be lost
dropped at router due to full buffers
sender times out prematurely sending two copies both of which are delivered
Transport Layer
3-28
four sendersmultihop paths timeoutretransmit
Q what happens as lin and lin
rsquo increase
finite shared output link buffers
Host A lout
Causescosts of congestion scenario 3
Host B
Host CHost D
lin original data
lin original data plus retransmitted data
A as red linrsquo increases all
arriving blue pkts at upper queue are dropped blue throughput g 0
Transport Layer
3-29
another ldquocostrdquo of congestion when packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Causescosts of congestion scenario 3
C2
C2
l out
linrsquo
Transport Layer
3-30
Approaches towards congestion controltwo broad approaches towards congestion
controlend-end congestion
controlno explicit
feedback from network
congestion inferred from end-system observed loss delay
approach taken by TCP
network-assisted congestion control
routers provide feedback to end systemssingle bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
explicit rate for sender to send at
Transport Layer
3-31Case study ATM ABR congestion
controlABR available bit rateldquoelastic servicerdquo if senderrsquos path
ldquounderloadedrdquo sender should use
available bandwidth
if senderrsquos path congested sender throttled to
minimum guaranteed rate
RM (resource management) cellssent by sender interspersed
with data cellsbits in RM cell set by switches
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild
congestion) CI bit congestion indication
RM cells returned to sender by receiver with bits intact
Transport Layer
3-32Case study ATM ABR congestion control
two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path
EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in
returned RM cell
RM cell data cell
Transport Layer
3-33
TCP congestion control
Transport Layer
3-34
TCP congestion control additive increase multiplicative decrease approach sender increases transmission
rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1
MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half
after loss
cwnd
TCP
sen
der
cong
estio
n w
indo
w s
ize
AIMD saw toothbehavior probing
for bandwidth
additively increase window size helliphellip until loss occurs (then cut window in half)
time
Transport Layer
3-35TCP Congestion Control details
sender limits transmission
cwnd is dynamic function of perceived network congestion
TCP sending rate
roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte
ACKed sent not-yet ACKed(ldquoin-flightrdquo)
last byte sent
cwnd
LastByteSent-LastByteAcked
lt cwnd
sender sequence number space
rate~~cwndRTT bytessec
Transport Layer
3-36TCP Slow Start
when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for
every ACK received
summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
Transport Layer
3-37
TCP detecting reacting to loss
loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then
grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly
TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)
Transport Layer
3-38
Q when should the exponential increase switch to linear
A when cwnd gets to 12 of its value before timeout
Implementationvariable ssthresh on loss event ssthresh is
set to 12 of cwnd just before loss event
TCP switching from slow start to CA
Transport Layer
3-39
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer
3-40TCP throughput
avg TCP thruput as function of window size RTTignore slow start assume always data to
sendW window size (measured in bytes) where loss
occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW
W2
avg TCP thruput = 34WRTTbytessec
Transport Layer
3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want
10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L
[Mathis 1997]
to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate
new versions of TCP for high-speed
TCP throughput = 122 MSSRTT L
Transport Layer
3-42
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP Fairness
TCP connection 2
Transport Layer
3-43
Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout
increasesmultiplicative decrease decreases throughput
proportionally R
R
equal bandwidth share
Connection 1 throughputCon
nec t
ion
2 t h
roug
hput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer
3-44
Fairness (more)Fairness and UDP
multimedia apps often do not use TCP do not want rate throttled
by congestion control
instead use UDP send audiovideo at
constant rate tolerate packet loss
Fairness parallel TCP connections
application can open multiple parallel connections between two hosts
web browsers do this eg link of rate R with 9
existing connectionsnew app asks for 1 TCP gets
rate R10new app asks for 11 TCPs
gets R2
- Congestion in networks
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (2)
- TCP ACK generation [RFC 1122 RFC 2581]
- TCP fast retransmit
- TCP fast retransmit (2)
- TCP flow control
- TCP flow control (2)
- Connection Management
- Agreeing to establish a connection
- Agreeing to establish a connection (2)
- TCP 3-way handshake
- TCP 3-way handshake FSM
- TCP closing a connection
- TCP closing a connection (2)
- Principles of congestion control
- Principles of congestion control (2)
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 2 (3)
- Causescosts of congestion scenario 2 (4)
- Causescosts of congestion scenario 2 (5)
- Causescosts of congestion scenario 2 (6)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- Case study ATM ABR congestion control
- Case study ATM ABR congestion control (2)
- TCP congestion control
- TCP congestion control additive increase multiplicative decre
- TCP Congestion Control details
- TCP Slow Start
- TCP detecting reacting to loss
- TCP switching from slow start to CA
- Summary TCP Congestion Control
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- TCP Fairness
- Why is TCP fair
- Fairness (more)
-
Transport Layer
3-14
Agreeing to establish a connection
2-way handshake failure scenarios
retransmitreq_conn(
x)
ESTAB
req_conn(x)
half open connection(no client)
client terminat
esserverforgets x
connection x completes
retransmitreq_conn(
x)
ESTAB
req_conn(x)
data(x+1)
retransmitdata(x+1)
acceptdata(x+1)
choose x req_conn(x)ESTAB
ESTAB
acc_conn(x)
client terminat
es
ESTAB
choose x req_conn(x)ESTAB
acc_conn(x)
data(x+1) acceptdata(x+1)
connection x completes server
forgets x
Transport Layer
3-15TCP 3-way handshake
SYNbit=1 Seq=x
choose init seq num xsend TCP SYN msg
ESTAB
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
choose init seq num ysend TCP SYNACKmsg acking SYN
ACKbit=1 ACKnum=y+1
received SYNACK(x) indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data received ACK(y)
indicates client is live
SYNSENT
ESTAB
SYN RCVD
client state
LISTENserver state
LISTEN
Transport Layer
3-16
TCP 3-way handshake FSM
closed
L
listen
SYNrcvd
SYNsent
ESTAB
Socket clientSocket = newSocket(hostnameport
number)
SYN(seq=x)
Socket connectionSocket = welcomeSocketaccept()
SYN(x)SYNACK(seq=yACKnum=x+1)create new socket for communication back to client
SYNACK(seq=yACKnum=x+1)ACK(ACKnum=y+1)ACK(ACKnum=y+1)
L
Transport Layer
3-17TCP closing a connection
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer
3-18
FIN_WAIT_2
CLOSE_WAIT
FINbit=1 seq=y
ACKbit=1 ACKnum=y+1
ACKbit=1 ACKnum=x+1 wait for server
closecan stillsend data
can no longersend data
LAST_ACK
CLOSED
TIMED_WAIT
timed wait for 2max
segment lifetime
CLOSED
TCP closing a connection
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data
clientSocketclose()
client state server state
ESTABESTAB
Transport Layer
3-19Principles of congestion control
Transport Layer
3-20
congestion informally ldquotoo many sources sending too much data too
fast for network to handlerdquodifferent from flow controlmanifestationslost packets (buffer overflow at routers)long delays (queueing in router
buffers)a top-10 problem
Principles of congestion control
Transport Layer
3-21
Causescosts of congestion scenario 1 two senders two receiversone router infinite buffers output link capacity Rno retransmission
maximum per-connection throughput R2
unlimited shared output link buffers
Host A
original data lin
Host B
throughput lout
R2
R2
l out
lin R2de
lay
lin large delays as arrival
rate lin approaches capacity
Transport Layer
3-22
one router finite buffers sender retransmission of timed-out packet application-layer input = application-layer output lin = lout
transport-layer input includes retransmissions lin lin
finite shared output link buffers
Host A
lin original data
Host B
loutlin original data plus retransmitted data
Causescosts of congestion scenario 2
Transport Layer
3-23
idealization perfect knowledge
sender sends only when router buffers available
finite shared output link buffers
lin original dataloutlin original data plus
retransmitted data
copy
free buffer space
R2
R2
l out
lin
Causescosts of congestion scenario 2
Host B
A
Transport Layer
3-24
lin original dataloutlin original data plus
retransmitted data
copy
no buffer space
Idealization known loss packets can be lost dropped at router due to full buffers
sender only resends if packet known to be lost
Causescosts of congestion scenario 2
A
Host B
Transport Layer
3-25
lin original dataloutlin original data plus
retransmitted data
free buffer space
Causescosts of congestion scenario 2
Idealization known loss packets can be lost dropped at router due to full buffers
sender only resends if packet known to be lost
R2
R2lin
l out
when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)
A
Host B
Transport Layer
3-26
A
lin loutlincopy
free buffer space
timeout
R2
R2lin
l out
when sending at R2 some packets are retransmissions including duplicated that are delivered
Host B
Realistic duplicates packets can be lost
dropped at router due to full buffers
sender times out prematurely sending two copies both of which are delivered
Causescosts of congestion scenario 2
Transport Layer
3-27
R2
l out
when sending at R2 some packets are retransmissions including duplicated that are delivered
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple
copies of pkt decreasing goodput
R2lin
Causescosts of congestion scenario 2 Realistic duplicates packets can be lost
dropped at router due to full buffers
sender times out prematurely sending two copies both of which are delivered
Transport Layer
3-28
four sendersmultihop paths timeoutretransmit
Q what happens as lin and lin
rsquo increase
finite shared output link buffers
Host A lout
Causescosts of congestion scenario 3
Host B
Host CHost D
lin original data
lin original data plus retransmitted data
A as red linrsquo increases all
arriving blue pkts at upper queue are dropped blue throughput g 0
Transport Layer
3-29
another ldquocostrdquo of congestion when packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Causescosts of congestion scenario 3
C2
C2
l out
linrsquo
Transport Layer
3-30
Approaches towards congestion controltwo broad approaches towards congestion
controlend-end congestion
controlno explicit
feedback from network
congestion inferred from end-system observed loss delay
approach taken by TCP
network-assisted congestion control
routers provide feedback to end systemssingle bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
explicit rate for sender to send at
Transport Layer
3-31Case study ATM ABR congestion
controlABR available bit rateldquoelastic servicerdquo if senderrsquos path
ldquounderloadedrdquo sender should use
available bandwidth
if senderrsquos path congested sender throttled to
minimum guaranteed rate
RM (resource management) cellssent by sender interspersed
with data cellsbits in RM cell set by switches
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild
congestion) CI bit congestion indication
RM cells returned to sender by receiver with bits intact
Transport Layer
3-32Case study ATM ABR congestion control
two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path
EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in
returned RM cell
RM cell data cell
Transport Layer
3-33
TCP congestion control
Transport Layer
3-34
TCP congestion control additive increase multiplicative decrease approach sender increases transmission
rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1
MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half
after loss
cwnd
TCP
sen
der
cong
estio
n w
indo
w s
ize
AIMD saw toothbehavior probing
for bandwidth
additively increase window size helliphellip until loss occurs (then cut window in half)
time
Transport Layer
3-35TCP Congestion Control details
sender limits transmission
cwnd is dynamic function of perceived network congestion
TCP sending rate
roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte
ACKed sent not-yet ACKed(ldquoin-flightrdquo)
last byte sent
cwnd
LastByteSent-LastByteAcked
lt cwnd
sender sequence number space
rate~~cwndRTT bytessec
Transport Layer
3-36TCP Slow Start
when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for
every ACK received
summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
Transport Layer
3-37
TCP detecting reacting to loss
loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then
grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly
TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)
Transport Layer
3-38
Q when should the exponential increase switch to linear
A when cwnd gets to 12 of its value before timeout
Implementationvariable ssthresh on loss event ssthresh is
set to 12 of cwnd just before loss event
TCP switching from slow start to CA
Transport Layer
3-39
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer
3-40TCP throughput
avg TCP thruput as function of window size RTTignore slow start assume always data to
sendW window size (measured in bytes) where loss
occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW
W2
avg TCP thruput = 34WRTTbytessec
Transport Layer
3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want
10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L
[Mathis 1997]
to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate
new versions of TCP for high-speed
TCP throughput = 122 MSSRTT L
Transport Layer
3-42
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP Fairness
TCP connection 2
Transport Layer
3-43
Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout
increasesmultiplicative decrease decreases throughput
proportionally R
R
equal bandwidth share
Connection 1 throughputCon
nec t
ion
2 t h
roug
hput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer
3-44
Fairness (more)Fairness and UDP
multimedia apps often do not use TCP do not want rate throttled
by congestion control
instead use UDP send audiovideo at
constant rate tolerate packet loss
Fairness parallel TCP connections
application can open multiple parallel connections between two hosts
web browsers do this eg link of rate R with 9
existing connectionsnew app asks for 1 TCP gets
rate R10new app asks for 11 TCPs
gets R2
- Congestion in networks
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (2)
- TCP ACK generation [RFC 1122 RFC 2581]
- TCP fast retransmit
- TCP fast retransmit (2)
- TCP flow control
- TCP flow control (2)
- Connection Management
- Agreeing to establish a connection
- Agreeing to establish a connection (2)
- TCP 3-way handshake
- TCP 3-way handshake FSM
- TCP closing a connection
- TCP closing a connection (2)
- Principles of congestion control
- Principles of congestion control (2)
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 2 (3)
- Causescosts of congestion scenario 2 (4)
- Causescosts of congestion scenario 2 (5)
- Causescosts of congestion scenario 2 (6)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- Case study ATM ABR congestion control
- Case study ATM ABR congestion control (2)
- TCP congestion control
- TCP congestion control additive increase multiplicative decre
- TCP Congestion Control details
- TCP Slow Start
- TCP detecting reacting to loss
- TCP switching from slow start to CA
- Summary TCP Congestion Control
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- TCP Fairness
- Why is TCP fair
- Fairness (more)
-
Transport Layer
3-15TCP 3-way handshake
SYNbit=1 Seq=x
choose init seq num xsend TCP SYN msg
ESTAB
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
choose init seq num ysend TCP SYNACKmsg acking SYN
ACKbit=1 ACKnum=y+1
received SYNACK(x) indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data received ACK(y)
indicates client is live
SYNSENT
ESTAB
SYN RCVD
client state
LISTENserver state
LISTEN
Transport Layer
3-16
TCP 3-way handshake FSM
closed
L
listen
SYNrcvd
SYNsent
ESTAB
Socket clientSocket = newSocket(hostnameport
number)
SYN(seq=x)
Socket connectionSocket = welcomeSocketaccept()
SYN(x)SYNACK(seq=yACKnum=x+1)create new socket for communication back to client
SYNACK(seq=yACKnum=x+1)ACK(ACKnum=y+1)ACK(ACKnum=y+1)
L
Transport Layer
3-17TCP closing a connection
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer
3-18
FIN_WAIT_2
CLOSE_WAIT
FINbit=1 seq=y
ACKbit=1 ACKnum=y+1
ACKbit=1 ACKnum=x+1 wait for server
closecan stillsend data
can no longersend data
LAST_ACK
CLOSED
TIMED_WAIT
timed wait for 2max
segment lifetime
CLOSED
TCP closing a connection
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data
clientSocketclose()
client state server state
ESTABESTAB
Transport Layer
3-19Principles of congestion control
Transport Layer
3-20
congestion informally ldquotoo many sources sending too much data too
fast for network to handlerdquodifferent from flow controlmanifestationslost packets (buffer overflow at routers)long delays (queueing in router
buffers)a top-10 problem
Principles of congestion control
Transport Layer
3-21
Causescosts of congestion scenario 1 two senders two receiversone router infinite buffers output link capacity Rno retransmission
maximum per-connection throughput R2
unlimited shared output link buffers
Host A
original data lin
Host B
throughput lout
R2
R2
l out
lin R2de
lay
lin large delays as arrival
rate lin approaches capacity
Transport Layer
3-22
one router finite buffers sender retransmission of timed-out packet application-layer input = application-layer output lin = lout
transport-layer input includes retransmissions lin lin
finite shared output link buffers
Host A
lin original data
Host B
loutlin original data plus retransmitted data
Causescosts of congestion scenario 2
Transport Layer
3-23
idealization perfect knowledge
sender sends only when router buffers available
finite shared output link buffers
lin original dataloutlin original data plus
retransmitted data
copy
free buffer space
R2
R2
l out
lin
Causescosts of congestion scenario 2
Host B
A
Transport Layer
3-24
lin original dataloutlin original data plus
retransmitted data
copy
no buffer space
Idealization known loss packets can be lost dropped at router due to full buffers
sender only resends if packet known to be lost
Causescosts of congestion scenario 2
A
Host B
Transport Layer
3-25
lin original dataloutlin original data plus
retransmitted data
free buffer space
Causescosts of congestion scenario 2
Idealization known loss packets can be lost dropped at router due to full buffers
sender only resends if packet known to be lost
R2
R2lin
l out
when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)
A
Host B
Transport Layer
3-26
A
lin loutlincopy
free buffer space
timeout
R2
R2lin
l out
when sending at R2 some packets are retransmissions including duplicated that are delivered
Host B
Realistic duplicates packets can be lost
dropped at router due to full buffers
sender times out prematurely sending two copies both of which are delivered
Causescosts of congestion scenario 2
Transport Layer
3-27
R2
l out
when sending at R2 some packets are retransmissions including duplicated that are delivered
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple
copies of pkt decreasing goodput
R2lin
Causescosts of congestion scenario 2 Realistic duplicates packets can be lost
dropped at router due to full buffers
sender times out prematurely sending two copies both of which are delivered
Transport Layer
3-28
four sendersmultihop paths timeoutretransmit
Q what happens as lin and lin
rsquo increase
finite shared output link buffers
Host A lout
Causescosts of congestion scenario 3
Host B
Host CHost D
lin original data
lin original data plus retransmitted data
A as red linrsquo increases all
arriving blue pkts at upper queue are dropped blue throughput g 0
Transport Layer
3-29
another ldquocostrdquo of congestion when packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Causescosts of congestion scenario 3
C2
C2
l out
linrsquo
Transport Layer
3-30
Approaches towards congestion controltwo broad approaches towards congestion
controlend-end congestion
controlno explicit
feedback from network
congestion inferred from end-system observed loss delay
approach taken by TCP
network-assisted congestion control
routers provide feedback to end systemssingle bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
explicit rate for sender to send at
Transport Layer
3-31Case study ATM ABR congestion
controlABR available bit rateldquoelastic servicerdquo if senderrsquos path
ldquounderloadedrdquo sender should use
available bandwidth
if senderrsquos path congested sender throttled to
minimum guaranteed rate
RM (resource management) cellssent by sender interspersed
with data cellsbits in RM cell set by switches
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild
congestion) CI bit congestion indication
RM cells returned to sender by receiver with bits intact
Transport Layer
3-32Case study ATM ABR congestion control
two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path
EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in
returned RM cell
RM cell data cell
Transport Layer
3-33
TCP congestion control
Transport Layer
3-34
TCP congestion control additive increase multiplicative decrease approach sender increases transmission
rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1
MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half
after loss
cwnd
TCP
sen
der
cong
estio
n w
indo
w s
ize
AIMD saw toothbehavior probing
for bandwidth
additively increase window size helliphellip until loss occurs (then cut window in half)
time
Transport Layer
3-35TCP Congestion Control details
sender limits transmission
cwnd is dynamic function of perceived network congestion
TCP sending rate
roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte
ACKed sent not-yet ACKed(ldquoin-flightrdquo)
last byte sent
cwnd
LastByteSent-LastByteAcked
lt cwnd
sender sequence number space
rate~~cwndRTT bytessec
Transport Layer
3-36TCP Slow Start
when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for
every ACK received
summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
Transport Layer
3-37
TCP detecting reacting to loss
loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then
grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly
TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)
Transport Layer
3-38
Q when should the exponential increase switch to linear
A when cwnd gets to 12 of its value before timeout
Implementationvariable ssthresh on loss event ssthresh is
set to 12 of cwnd just before loss event
TCP switching from slow start to CA
Transport Layer
3-39
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer
3-40TCP throughput
avg TCP thruput as function of window size RTTignore slow start assume always data to
sendW window size (measured in bytes) where loss
occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW
W2
avg TCP thruput = 34WRTTbytessec
Transport Layer
3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want
10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L
[Mathis 1997]
to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate
new versions of TCP for high-speed
TCP throughput = 122 MSSRTT L
Transport Layer
3-42
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP Fairness
TCP connection 2
Transport Layer
3-43
Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout
increasesmultiplicative decrease decreases throughput
proportionally R
R
equal bandwidth share
Connection 1 throughputCon
nec t
ion
2 t h
roug
hput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer
3-44
Fairness (more)Fairness and UDP
multimedia apps often do not use TCP do not want rate throttled
by congestion control
instead use UDP send audiovideo at
constant rate tolerate packet loss
Fairness parallel TCP connections
application can open multiple parallel connections between two hosts
web browsers do this eg link of rate R with 9
existing connectionsnew app asks for 1 TCP gets
rate R10new app asks for 11 TCPs
gets R2
- Congestion in networks
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (2)
- TCP ACK generation [RFC 1122 RFC 2581]
- TCP fast retransmit
- TCP fast retransmit (2)
- TCP flow control
- TCP flow control (2)
- Connection Management
- Agreeing to establish a connection
- Agreeing to establish a connection (2)
- TCP 3-way handshake
- TCP 3-way handshake FSM
- TCP closing a connection
- TCP closing a connection (2)
- Principles of congestion control
- Principles of congestion control (2)
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 2 (3)
- Causescosts of congestion scenario 2 (4)
- Causescosts of congestion scenario 2 (5)
- Causescosts of congestion scenario 2 (6)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- Case study ATM ABR congestion control
- Case study ATM ABR congestion control (2)
- TCP congestion control
- TCP congestion control additive increase multiplicative decre
- TCP Congestion Control details
- TCP Slow Start
- TCP detecting reacting to loss
- TCP switching from slow start to CA
- Summary TCP Congestion Control
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- TCP Fairness
- Why is TCP fair
- Fairness (more)
-
Transport Layer
3-16
TCP 3-way handshake FSM
closed
L
listen
SYNrcvd
SYNsent
ESTAB
Socket clientSocket = newSocket(hostnameport
number)
SYN(seq=x)
Socket connectionSocket = welcomeSocketaccept()
SYN(x)SYNACK(seq=yACKnum=x+1)create new socket for communication back to client
SYNACK(seq=yACKnum=x+1)ACK(ACKnum=y+1)ACK(ACKnum=y+1)
L
Transport Layer
3-17TCP closing a connection
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer
3-18
FIN_WAIT_2
CLOSE_WAIT
FINbit=1 seq=y
ACKbit=1 ACKnum=y+1
ACKbit=1 ACKnum=x+1 wait for server
closecan stillsend data
can no longersend data
LAST_ACK
CLOSED
TIMED_WAIT
timed wait for 2max
segment lifetime
CLOSED
TCP closing a connection
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data
clientSocketclose()
client state server state
ESTABESTAB
Transport Layer
3-19Principles of congestion control
Transport Layer
3-20
congestion informally ldquotoo many sources sending too much data too
fast for network to handlerdquodifferent from flow controlmanifestationslost packets (buffer overflow at routers)long delays (queueing in router
buffers)a top-10 problem
Principles of congestion control
Transport Layer
3-21
Causescosts of congestion scenario 1 two senders two receiversone router infinite buffers output link capacity Rno retransmission
maximum per-connection throughput R2
unlimited shared output link buffers
Host A
original data lin
Host B
throughput lout
R2
R2
l out
lin R2de
lay
lin large delays as arrival
rate lin approaches capacity
Transport Layer
3-22
one router finite buffers sender retransmission of timed-out packet application-layer input = application-layer output lin = lout
transport-layer input includes retransmissions lin lin
finite shared output link buffers
Host A
lin original data
Host B
loutlin original data plus retransmitted data
Causescosts of congestion scenario 2
Transport Layer
3-23
idealization perfect knowledge
sender sends only when router buffers available
finite shared output link buffers
lin original dataloutlin original data plus
retransmitted data
copy
free buffer space
R2
R2
l out
lin
Causescosts of congestion scenario 2
Host B
A
Transport Layer
3-24
lin original dataloutlin original data plus
retransmitted data
copy
no buffer space
Idealization known loss packets can be lost dropped at router due to full buffers
sender only resends if packet known to be lost
Causescosts of congestion scenario 2
A
Host B
Transport Layer
3-25
lin original dataloutlin original data plus
retransmitted data
free buffer space
Causescosts of congestion scenario 2
Idealization known loss packets can be lost dropped at router due to full buffers
sender only resends if packet known to be lost
R2
R2lin
l out
when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)
A
Host B
Transport Layer
3-26
A
lin loutlincopy
free buffer space
timeout
R2
R2lin
l out
when sending at R2 some packets are retransmissions including duplicated that are delivered
Host B
Realistic duplicates packets can be lost
dropped at router due to full buffers
sender times out prematurely sending two copies both of which are delivered
Causescosts of congestion scenario 2
Transport Layer
3-27
R2
l out
when sending at R2 some packets are retransmissions including duplicated that are delivered
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple
copies of pkt decreasing goodput
R2lin
Causescosts of congestion scenario 2 Realistic duplicates packets can be lost
dropped at router due to full buffers
sender times out prematurely sending two copies both of which are delivered
Transport Layer
3-28
four sendersmultihop paths timeoutretransmit
Q what happens as lin and lin
rsquo increase
finite shared output link buffers
Host A lout
Causescosts of congestion scenario 3
Host B
Host CHost D
lin original data
lin original data plus retransmitted data
A as red linrsquo increases all
arriving blue pkts at upper queue are dropped blue throughput g 0
Transport Layer
3-29
another ldquocostrdquo of congestion when packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Causescosts of congestion scenario 3
C2
C2
l out
linrsquo
Transport Layer
3-30
Approaches towards congestion controltwo broad approaches towards congestion
controlend-end congestion
controlno explicit
feedback from network
congestion inferred from end-system observed loss delay
approach taken by TCP
network-assisted congestion control
routers provide feedback to end systemssingle bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
explicit rate for sender to send at
Transport Layer
3-31Case study ATM ABR congestion
controlABR available bit rateldquoelastic servicerdquo if senderrsquos path
ldquounderloadedrdquo sender should use
available bandwidth
if senderrsquos path congested sender throttled to
minimum guaranteed rate
RM (resource management) cellssent by sender interspersed
with data cellsbits in RM cell set by switches
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild
congestion) CI bit congestion indication
RM cells returned to sender by receiver with bits intact
Transport Layer
3-32Case study ATM ABR congestion control
two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path
EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in
returned RM cell
RM cell data cell
Transport Layer
3-33
TCP congestion control
Transport Layer
3-34
TCP congestion control additive increase multiplicative decrease approach sender increases transmission
rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1
MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half
after loss
cwnd
TCP
sen
der
cong
estio
n w
indo
w s
ize
AIMD saw toothbehavior probing
for bandwidth
additively increase window size helliphellip until loss occurs (then cut window in half)
time
Transport Layer
3-35TCP Congestion Control details
sender limits transmission
cwnd is dynamic function of perceived network congestion
TCP sending rate
roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte
ACKed sent not-yet ACKed(ldquoin-flightrdquo)
last byte sent
cwnd
LastByteSent-LastByteAcked
lt cwnd
sender sequence number space
rate~~cwndRTT bytessec
Transport Layer
3-36TCP Slow Start
when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for
every ACK received
summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
Transport Layer
3-37
TCP detecting reacting to loss
loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then
grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly
TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)
Transport Layer
3-38
Q when should the exponential increase switch to linear
A when cwnd gets to 12 of its value before timeout
Implementationvariable ssthresh on loss event ssthresh is
set to 12 of cwnd just before loss event
TCP switching from slow start to CA
Transport Layer
3-39
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer
3-40TCP throughput
avg TCP thruput as function of window size RTTignore slow start assume always data to
sendW window size (measured in bytes) where loss
occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW
W2
avg TCP thruput = 34WRTTbytessec
Transport Layer
3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want
10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L
[Mathis 1997]
to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate
new versions of TCP for high-speed
TCP throughput = 122 MSSRTT L
Transport Layer
3-42
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP Fairness
TCP connection 2
Transport Layer
3-43
Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout
increasesmultiplicative decrease decreases throughput
proportionally R
R
equal bandwidth share
Connection 1 throughputCon
nec t
ion
2 t h
roug
hput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer
3-44
Fairness (more)Fairness and UDP
multimedia apps often do not use TCP do not want rate throttled
by congestion control
instead use UDP send audiovideo at
constant rate tolerate packet loss
Fairness parallel TCP connections
application can open multiple parallel connections between two hosts
web browsers do this eg link of rate R with 9
existing connectionsnew app asks for 1 TCP gets
rate R10new app asks for 11 TCPs
gets R2
- Congestion in networks
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (2)
- TCP ACK generation [RFC 1122 RFC 2581]
- TCP fast retransmit
- TCP fast retransmit (2)
- TCP flow control
- TCP flow control (2)
- Connection Management
- Agreeing to establish a connection
- Agreeing to establish a connection (2)
- TCP 3-way handshake
- TCP 3-way handshake FSM
- TCP closing a connection
- TCP closing a connection (2)
- Principles of congestion control
- Principles of congestion control (2)
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 2 (3)
- Causescosts of congestion scenario 2 (4)
- Causescosts of congestion scenario 2 (5)
- Causescosts of congestion scenario 2 (6)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- Case study ATM ABR congestion control
- Case study ATM ABR congestion control (2)
- TCP congestion control
- TCP congestion control additive increase multiplicative decre
- TCP Congestion Control details
- TCP Slow Start
- TCP detecting reacting to loss
- TCP switching from slow start to CA
- Summary TCP Congestion Control
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- TCP Fairness
- Why is TCP fair
- Fairness (more)
-
Transport Layer
3-17TCP closing a connection
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer
3-18
FIN_WAIT_2
CLOSE_WAIT
FINbit=1 seq=y
ACKbit=1 ACKnum=y+1
ACKbit=1 ACKnum=x+1 wait for server
closecan stillsend data
can no longersend data
LAST_ACK
CLOSED
TIMED_WAIT
timed wait for 2max
segment lifetime
CLOSED
TCP closing a connection
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data
clientSocketclose()
client state server state
ESTABESTAB
Transport Layer
3-19Principles of congestion control
Transport Layer
3-20
congestion informally ldquotoo many sources sending too much data too
fast for network to handlerdquodifferent from flow controlmanifestationslost packets (buffer overflow at routers)long delays (queueing in router
buffers)a top-10 problem
Principles of congestion control
Transport Layer
3-21
Causescosts of congestion scenario 1 two senders two receiversone router infinite buffers output link capacity Rno retransmission
maximum per-connection throughput R2
unlimited shared output link buffers
Host A
original data lin
Host B
throughput lout
R2
R2
l out
lin R2de
lay
lin large delays as arrival
rate lin approaches capacity
Transport Layer
3-22
one router finite buffers sender retransmission of timed-out packet application-layer input = application-layer output lin = lout
transport-layer input includes retransmissions lin lin
finite shared output link buffers
Host A
lin original data
Host B
loutlin original data plus retransmitted data
Causescosts of congestion scenario 2
Transport Layer
3-23
idealization perfect knowledge
sender sends only when router buffers available
finite shared output link buffers
lin original dataloutlin original data plus
retransmitted data
copy
free buffer space
R2
R2
l out
lin
Causescosts of congestion scenario 2
Host B
A
Transport Layer
3-24
lin original dataloutlin original data plus
retransmitted data
copy
no buffer space
Idealization known loss packets can be lost dropped at router due to full buffers
sender only resends if packet known to be lost
Causescosts of congestion scenario 2
A
Host B
Transport Layer
3-25
lin original dataloutlin original data plus
retransmitted data
free buffer space
Causescosts of congestion scenario 2
Idealization known loss packets can be lost dropped at router due to full buffers
sender only resends if packet known to be lost
R2
R2lin
l out
when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)
A
Host B
Transport Layer
3-26
A
lin loutlincopy
free buffer space
timeout
R2
R2lin
l out
when sending at R2 some packets are retransmissions including duplicated that are delivered
Host B
Realistic duplicates packets can be lost
dropped at router due to full buffers
sender times out prematurely sending two copies both of which are delivered
Causescosts of congestion scenario 2
Transport Layer
3-27
R2
l out
when sending at R2 some packets are retransmissions including duplicated that are delivered
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple
copies of pkt decreasing goodput
R2lin
Causescosts of congestion scenario 2 Realistic duplicates packets can be lost
dropped at router due to full buffers
sender times out prematurely sending two copies both of which are delivered
Transport Layer
3-28
four sendersmultihop paths timeoutretransmit
Q what happens as lin and lin
rsquo increase
finite shared output link buffers
Host A lout
Causescosts of congestion scenario 3
Host B
Host CHost D
lin original data
lin original data plus retransmitted data
A as red linrsquo increases all
arriving blue pkts at upper queue are dropped blue throughput g 0
Transport Layer
3-29
another ldquocostrdquo of congestion when packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Causescosts of congestion scenario 3
C2
C2
l out
linrsquo
Transport Layer
3-30
Approaches towards congestion controltwo broad approaches towards congestion
controlend-end congestion
controlno explicit
feedback from network
congestion inferred from end-system observed loss delay
approach taken by TCP
network-assisted congestion control
routers provide feedback to end systemssingle bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
explicit rate for sender to send at
Transport Layer
3-31Case study ATM ABR congestion
controlABR available bit rateldquoelastic servicerdquo if senderrsquos path
ldquounderloadedrdquo sender should use
available bandwidth
if senderrsquos path congested sender throttled to
minimum guaranteed rate
RM (resource management) cellssent by sender interspersed
with data cellsbits in RM cell set by switches
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild
congestion) CI bit congestion indication
RM cells returned to sender by receiver with bits intact
Transport Layer
3-32Case study ATM ABR congestion control
two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path
EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in
returned RM cell
RM cell data cell
Transport Layer
3-33
TCP congestion control
Transport Layer
3-34
TCP congestion control additive increase multiplicative decrease approach sender increases transmission
rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1
MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half
after loss
cwnd
TCP
sen
der
cong
estio
n w
indo
w s
ize
AIMD saw toothbehavior probing
for bandwidth
additively increase window size helliphellip until loss occurs (then cut window in half)
time
Transport Layer
3-35TCP Congestion Control details
sender limits transmission
cwnd is dynamic function of perceived network congestion
TCP sending rate
roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte
ACKed sent not-yet ACKed(ldquoin-flightrdquo)
last byte sent
cwnd
LastByteSent-LastByteAcked
lt cwnd
sender sequence number space
rate~~cwndRTT bytessec
Transport Layer
3-36TCP Slow Start
when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for
every ACK received
summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
Transport Layer
3-37
TCP detecting reacting to loss
loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then
grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly
TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)
Transport Layer
3-38
Q when should the exponential increase switch to linear
A when cwnd gets to 12 of its value before timeout
Implementationvariable ssthresh on loss event ssthresh is
set to 12 of cwnd just before loss event
TCP switching from slow start to CA
Transport Layer
3-39
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer
3-40TCP throughput
avg TCP thruput as function of window size RTTignore slow start assume always data to
sendW window size (measured in bytes) where loss
occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW
W2
avg TCP thruput = 34WRTTbytessec
Transport Layer
3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want
10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L
[Mathis 1997]
to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate
new versions of TCP for high-speed
TCP throughput = 122 MSSRTT L
Transport Layer
3-42
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP Fairness
TCP connection 2
Transport Layer
3-43
Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout
increasesmultiplicative decrease decreases throughput
proportionally R
R
equal bandwidth share
Connection 1 throughputCon
nec t
ion
2 t h
roug
hput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer
3-44
Fairness (more)Fairness and UDP
multimedia apps often do not use TCP do not want rate throttled
by congestion control
instead use UDP send audiovideo at
constant rate tolerate packet loss
Fairness parallel TCP connections
application can open multiple parallel connections between two hosts
web browsers do this eg link of rate R with 9
existing connectionsnew app asks for 1 TCP gets
rate R10new app asks for 11 TCPs
gets R2
- Congestion in networks
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (2)
- TCP ACK generation [RFC 1122 RFC 2581]
- TCP fast retransmit
- TCP fast retransmit (2)
- TCP flow control
- TCP flow control (2)
- Connection Management
- Agreeing to establish a connection
- Agreeing to establish a connection (2)
- TCP 3-way handshake
- TCP 3-way handshake FSM
- TCP closing a connection
- TCP closing a connection (2)
- Principles of congestion control
- Principles of congestion control (2)
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 2 (3)
- Causescosts of congestion scenario 2 (4)
- Causescosts of congestion scenario 2 (5)
- Causescosts of congestion scenario 2 (6)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- Case study ATM ABR congestion control
- Case study ATM ABR congestion control (2)
- TCP congestion control
- TCP congestion control additive increase multiplicative decre
- TCP Congestion Control details
- TCP Slow Start
- TCP detecting reacting to loss
- TCP switching from slow start to CA
- Summary TCP Congestion Control
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- TCP Fairness
- Why is TCP fair
- Fairness (more)
-
Transport Layer
3-18
FIN_WAIT_2
CLOSE_WAIT
FINbit=1 seq=y
ACKbit=1 ACKnum=y+1
ACKbit=1 ACKnum=x+1 wait for server
closecan stillsend data
can no longersend data
LAST_ACK
CLOSED
TIMED_WAIT
timed wait for 2max
segment lifetime
CLOSED
TCP closing a connection
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data
clientSocketclose()
client state server state
ESTABESTAB
Transport Layer
3-19Principles of congestion control
Transport Layer
3-20
congestion informally ldquotoo many sources sending too much data too
fast for network to handlerdquodifferent from flow controlmanifestationslost packets (buffer overflow at routers)long delays (queueing in router
buffers)a top-10 problem
Principles of congestion control
Transport Layer
3-21
Causescosts of congestion scenario 1 two senders two receiversone router infinite buffers output link capacity Rno retransmission
maximum per-connection throughput R2
unlimited shared output link buffers
Host A
original data lin
Host B
throughput lout
R2
R2
l out
lin R2de
lay
lin large delays as arrival
rate lin approaches capacity
Transport Layer
3-22
one router finite buffers sender retransmission of timed-out packet application-layer input = application-layer output lin = lout
transport-layer input includes retransmissions lin lin
finite shared output link buffers
Host A
lin original data
Host B
loutlin original data plus retransmitted data
Causescosts of congestion scenario 2
Transport Layer
3-23
idealization perfect knowledge
sender sends only when router buffers available
finite shared output link buffers
lin original dataloutlin original data plus
retransmitted data
copy
free buffer space
R2
R2
l out
lin
Causescosts of congestion scenario 2
Host B
A
Transport Layer
3-24
lin original dataloutlin original data plus
retransmitted data
copy
no buffer space
Idealization known loss packets can be lost dropped at router due to full buffers
sender only resends if packet known to be lost
Causescosts of congestion scenario 2
A
Host B
Transport Layer
3-25
lin original dataloutlin original data plus
retransmitted data
free buffer space
Causescosts of congestion scenario 2
Idealization known loss packets can be lost dropped at router due to full buffers
sender only resends if packet known to be lost
R2
R2lin
l out
when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)
A
Host B
Transport Layer
3-26
A
lin loutlincopy
free buffer space
timeout
R2
R2lin
l out
when sending at R2 some packets are retransmissions including duplicated that are delivered
Host B
Realistic duplicates packets can be lost
dropped at router due to full buffers
sender times out prematurely sending two copies both of which are delivered
Causescosts of congestion scenario 2
Transport Layer
3-27
R2
l out
when sending at R2 some packets are retransmissions including duplicated that are delivered
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple
copies of pkt decreasing goodput
R2lin
Causescosts of congestion scenario 2 Realistic duplicates packets can be lost
dropped at router due to full buffers
sender times out prematurely sending two copies both of which are delivered
Transport Layer
3-28
four sendersmultihop paths timeoutretransmit
Q what happens as lin and lin
rsquo increase
finite shared output link buffers
Host A lout
Causescosts of congestion scenario 3
Host B
Host CHost D
lin original data
lin original data plus retransmitted data
A as red linrsquo increases all
arriving blue pkts at upper queue are dropped blue throughput g 0
Transport Layer
3-29
another ldquocostrdquo of congestion when packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Causescosts of congestion scenario 3
C2
C2
l out
linrsquo
Transport Layer
3-30
Approaches towards congestion controltwo broad approaches towards congestion
controlend-end congestion
controlno explicit
feedback from network
congestion inferred from end-system observed loss delay
approach taken by TCP
network-assisted congestion control
routers provide feedback to end systemssingle bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
explicit rate for sender to send at
Transport Layer
3-31Case study ATM ABR congestion
controlABR available bit rateldquoelastic servicerdquo if senderrsquos path
ldquounderloadedrdquo sender should use
available bandwidth
if senderrsquos path congested sender throttled to
minimum guaranteed rate
RM (resource management) cellssent by sender interspersed
with data cellsbits in RM cell set by switches
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild
congestion) CI bit congestion indication
RM cells returned to sender by receiver with bits intact
Transport Layer
3-32Case study ATM ABR congestion control
two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path
EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in
returned RM cell
RM cell data cell
Transport Layer
3-33
TCP congestion control
Transport Layer
3-34
TCP congestion control additive increase multiplicative decrease approach sender increases transmission
rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1
MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half
after loss
cwnd
TCP
sen
der
cong
estio
n w
indo
w s
ize
AIMD saw toothbehavior probing
for bandwidth
additively increase window size helliphellip until loss occurs (then cut window in half)
time
Transport Layer
3-35TCP Congestion Control details
sender limits transmission
cwnd is dynamic function of perceived network congestion
TCP sending rate
roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte
ACKed sent not-yet ACKed(ldquoin-flightrdquo)
last byte sent
cwnd
LastByteSent-LastByteAcked
lt cwnd
sender sequence number space
rate~~cwndRTT bytessec
Transport Layer
3-36TCP Slow Start
when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for
every ACK received
summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
Transport Layer
3-37
TCP detecting reacting to loss
loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then
grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly
TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)
Transport Layer
3-38
Q when should the exponential increase switch to linear
A when cwnd gets to 12 of its value before timeout
Implementationvariable ssthresh on loss event ssthresh is
set to 12 of cwnd just before loss event
TCP switching from slow start to CA
Transport Layer
3-39
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer
3-40TCP throughput
avg TCP thruput as function of window size RTTignore slow start assume always data to
sendW window size (measured in bytes) where loss
occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW
W2
avg TCP thruput = 34WRTTbytessec
Transport Layer
3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want
10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L
[Mathis 1997]
to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate
new versions of TCP for high-speed
TCP throughput = 122 MSSRTT L
Transport Layer
3-42
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP Fairness
TCP connection 2
Transport Layer
3-43
Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout
increasesmultiplicative decrease decreases throughput
proportionally R
R
equal bandwidth share
Connection 1 throughputCon
nec t
ion
2 t h
roug
hput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer
3-44
Fairness (more)Fairness and UDP
multimedia apps often do not use TCP do not want rate throttled
by congestion control
instead use UDP send audiovideo at
constant rate tolerate packet loss
Fairness parallel TCP connections
application can open multiple parallel connections between two hosts
web browsers do this eg link of rate R with 9
existing connectionsnew app asks for 1 TCP gets
rate R10new app asks for 11 TCPs
gets R2
- Congestion in networks
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (2)
- TCP ACK generation [RFC 1122 RFC 2581]
- TCP fast retransmit
- TCP fast retransmit (2)
- TCP flow control
- TCP flow control (2)
- Connection Management
- Agreeing to establish a connection
- Agreeing to establish a connection (2)
- TCP 3-way handshake
- TCP 3-way handshake FSM
- TCP closing a connection
- TCP closing a connection (2)
- Principles of congestion control
- Principles of congestion control (2)
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 2 (3)
- Causescosts of congestion scenario 2 (4)
- Causescosts of congestion scenario 2 (5)
- Causescosts of congestion scenario 2 (6)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- Case study ATM ABR congestion control
- Case study ATM ABR congestion control (2)
- TCP congestion control
- TCP congestion control additive increase multiplicative decre
- TCP Congestion Control details
- TCP Slow Start
- TCP detecting reacting to loss
- TCP switching from slow start to CA
- Summary TCP Congestion Control
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- TCP Fairness
- Why is TCP fair
- Fairness (more)
-
Transport Layer
3-19Principles of congestion control
Transport Layer
3-20
congestion informally ldquotoo many sources sending too much data too
fast for network to handlerdquodifferent from flow controlmanifestationslost packets (buffer overflow at routers)long delays (queueing in router
buffers)a top-10 problem
Principles of congestion control
Transport Layer
3-21
Causescosts of congestion scenario 1 two senders two receiversone router infinite buffers output link capacity Rno retransmission
maximum per-connection throughput R2
unlimited shared output link buffers
Host A
original data lin
Host B
throughput lout
R2
R2
l out
lin R2de
lay
lin large delays as arrival
rate lin approaches capacity
Transport Layer
3-22
one router finite buffers sender retransmission of timed-out packet application-layer input = application-layer output lin = lout
transport-layer input includes retransmissions lin lin
finite shared output link buffers
Host A
lin original data
Host B
loutlin original data plus retransmitted data
Causescosts of congestion scenario 2
Transport Layer
3-23
idealization perfect knowledge
sender sends only when router buffers available
finite shared output link buffers
lin original dataloutlin original data plus
retransmitted data
copy
free buffer space
R2
R2
l out
lin
Causescosts of congestion scenario 2
Host B
A
Transport Layer
3-24
lin original dataloutlin original data plus
retransmitted data
copy
no buffer space
Idealization known loss packets can be lost dropped at router due to full buffers
sender only resends if packet known to be lost
Causescosts of congestion scenario 2
A
Host B
Transport Layer
3-25
lin original dataloutlin original data plus
retransmitted data
free buffer space
Causescosts of congestion scenario 2
Idealization known loss packets can be lost dropped at router due to full buffers
sender only resends if packet known to be lost
R2
R2lin
l out
when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)
A
Host B
Transport Layer
3-26
A
lin loutlincopy
free buffer space
timeout
R2
R2lin
l out
when sending at R2 some packets are retransmissions including duplicated that are delivered
Host B
Realistic duplicates packets can be lost
dropped at router due to full buffers
sender times out prematurely sending two copies both of which are delivered
Causescosts of congestion scenario 2
Transport Layer
3-27
R2
l out
when sending at R2 some packets are retransmissions including duplicated that are delivered
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple
copies of pkt decreasing goodput
R2lin
Causescosts of congestion scenario 2 Realistic duplicates packets can be lost
dropped at router due to full buffers
sender times out prematurely sending two copies both of which are delivered
Transport Layer
3-28
four sendersmultihop paths timeoutretransmit
Q what happens as lin and lin
rsquo increase
finite shared output link buffers
Host A lout
Causescosts of congestion scenario 3
Host B
Host CHost D
lin original data
lin original data plus retransmitted data
A as red linrsquo increases all
arriving blue pkts at upper queue are dropped blue throughput g 0
Transport Layer
3-29
another ldquocostrdquo of congestion when packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Causescosts of congestion scenario 3
C2
C2
l out
linrsquo
Transport Layer
3-30
Approaches towards congestion controltwo broad approaches towards congestion
controlend-end congestion
controlno explicit
feedback from network
congestion inferred from end-system observed loss delay
approach taken by TCP
network-assisted congestion control
routers provide feedback to end systemssingle bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
explicit rate for sender to send at
Transport Layer
3-31Case study ATM ABR congestion
controlABR available bit rateldquoelastic servicerdquo if senderrsquos path
ldquounderloadedrdquo sender should use
available bandwidth
if senderrsquos path congested sender throttled to
minimum guaranteed rate
RM (resource management) cellssent by sender interspersed
with data cellsbits in RM cell set by switches
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild
congestion) CI bit congestion indication
RM cells returned to sender by receiver with bits intact
Transport Layer
3-32Case study ATM ABR congestion control
two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path
EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in
returned RM cell
RM cell data cell
Transport Layer
3-33
TCP congestion control
Transport Layer
3-34
TCP congestion control additive increase multiplicative decrease approach sender increases transmission
rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1
MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half
after loss
cwnd
TCP
sen
der
cong
estio
n w
indo
w s
ize
AIMD saw toothbehavior probing
for bandwidth
additively increase window size helliphellip until loss occurs (then cut window in half)
time
Transport Layer
3-35TCP Congestion Control details
sender limits transmission
cwnd is dynamic function of perceived network congestion
TCP sending rate
roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte
ACKed sent not-yet ACKed(ldquoin-flightrdquo)
last byte sent
cwnd
LastByteSent-LastByteAcked
lt cwnd
sender sequence number space
rate~~cwndRTT bytessec
Transport Layer
3-36TCP Slow Start
when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for
every ACK received
summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
Transport Layer
3-37
TCP detecting reacting to loss
loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then
grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly
TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)
Transport Layer
3-38
Q when should the exponential increase switch to linear
A when cwnd gets to 12 of its value before timeout
Implementationvariable ssthresh on loss event ssthresh is
set to 12 of cwnd just before loss event
TCP switching from slow start to CA
Transport Layer
3-39
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer
3-40TCP throughput
avg TCP thruput as function of window size RTTignore slow start assume always data to
sendW window size (measured in bytes) where loss
occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW
W2
avg TCP thruput = 34WRTTbytessec
Transport Layer
3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want
10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L
[Mathis 1997]
to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate
new versions of TCP for high-speed
TCP throughput = 122 MSSRTT L
Transport Layer
3-42
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP Fairness
TCP connection 2
Transport Layer
3-43
Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout
increasesmultiplicative decrease decreases throughput
proportionally R
R
equal bandwidth share
Connection 1 throughputCon
nec t
ion
2 t h
roug
hput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer
3-44
Fairness (more)Fairness and UDP
multimedia apps often do not use TCP do not want rate throttled
by congestion control
instead use UDP send audiovideo at
constant rate tolerate packet loss
Fairness parallel TCP connections
application can open multiple parallel connections between two hosts
web browsers do this eg link of rate R with 9
existing connectionsnew app asks for 1 TCP gets
rate R10new app asks for 11 TCPs
gets R2
- Congestion in networks
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (2)
- TCP ACK generation [RFC 1122 RFC 2581]
- TCP fast retransmit
- TCP fast retransmit (2)
- TCP flow control
- TCP flow control (2)
- Connection Management
- Agreeing to establish a connection
- Agreeing to establish a connection (2)
- TCP 3-way handshake
- TCP 3-way handshake FSM
- TCP closing a connection
- TCP closing a connection (2)
- Principles of congestion control
- Principles of congestion control (2)
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 2 (3)
- Causescosts of congestion scenario 2 (4)
- Causescosts of congestion scenario 2 (5)
- Causescosts of congestion scenario 2 (6)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- Case study ATM ABR congestion control
- Case study ATM ABR congestion control (2)
- TCP congestion control
- TCP congestion control additive increase multiplicative decre
- TCP Congestion Control details
- TCP Slow Start
- TCP detecting reacting to loss
- TCP switching from slow start to CA
- Summary TCP Congestion Control
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- TCP Fairness
- Why is TCP fair
- Fairness (more)
-
Transport Layer
3-20
congestion informally ldquotoo many sources sending too much data too
fast for network to handlerdquodifferent from flow controlmanifestationslost packets (buffer overflow at routers)long delays (queueing in router
buffers)a top-10 problem
Principles of congestion control
Transport Layer
3-21
Causescosts of congestion scenario 1 two senders two receiversone router infinite buffers output link capacity Rno retransmission
maximum per-connection throughput R2
unlimited shared output link buffers
Host A
original data lin
Host B
throughput lout
R2
R2
l out
lin R2de
lay
lin large delays as arrival
rate lin approaches capacity
Transport Layer
3-22
one router finite buffers sender retransmission of timed-out packet application-layer input = application-layer output lin = lout
transport-layer input includes retransmissions lin lin
finite shared output link buffers
Host A
lin original data
Host B
loutlin original data plus retransmitted data
Causescosts of congestion scenario 2
Transport Layer
3-23
idealization perfect knowledge
sender sends only when router buffers available
finite shared output link buffers
lin original dataloutlin original data plus
retransmitted data
copy
free buffer space
R2
R2
l out
lin
Causescosts of congestion scenario 2
Host B
A
Transport Layer
3-24
lin original dataloutlin original data plus
retransmitted data
copy
no buffer space
Idealization known loss packets can be lost dropped at router due to full buffers
sender only resends if packet known to be lost
Causescosts of congestion scenario 2
A
Host B
Transport Layer
3-25
lin original dataloutlin original data plus
retransmitted data
free buffer space
Causescosts of congestion scenario 2
Idealization known loss packets can be lost dropped at router due to full buffers
sender only resends if packet known to be lost
R2
R2lin
l out
when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)
A
Host B
Transport Layer
3-26
A
lin loutlincopy
free buffer space
timeout
R2
R2lin
l out
when sending at R2 some packets are retransmissions including duplicated that are delivered
Host B
Realistic duplicates packets can be lost
dropped at router due to full buffers
sender times out prematurely sending two copies both of which are delivered
Causescosts of congestion scenario 2
Transport Layer
3-27
R2
l out
when sending at R2 some packets are retransmissions including duplicated that are delivered
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple
copies of pkt decreasing goodput
R2lin
Causescosts of congestion scenario 2 Realistic duplicates packets can be lost
dropped at router due to full buffers
sender times out prematurely sending two copies both of which are delivered
Transport Layer
3-28
four sendersmultihop paths timeoutretransmit
Q what happens as lin and lin
rsquo increase
finite shared output link buffers
Host A lout
Causescosts of congestion scenario 3
Host B
Host CHost D
lin original data
lin original data plus retransmitted data
A as red linrsquo increases all
arriving blue pkts at upper queue are dropped blue throughput g 0
Transport Layer
3-29
another ldquocostrdquo of congestion when packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Causescosts of congestion scenario 3
C2
C2
l out
linrsquo
Transport Layer
3-30
Approaches towards congestion controltwo broad approaches towards congestion
controlend-end congestion
controlno explicit
feedback from network
congestion inferred from end-system observed loss delay
approach taken by TCP
network-assisted congestion control
routers provide feedback to end systemssingle bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
explicit rate for sender to send at
Transport Layer
3-31Case study ATM ABR congestion
controlABR available bit rateldquoelastic servicerdquo if senderrsquos path
ldquounderloadedrdquo sender should use
available bandwidth
if senderrsquos path congested sender throttled to
minimum guaranteed rate
RM (resource management) cellssent by sender interspersed
with data cellsbits in RM cell set by switches
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild
congestion) CI bit congestion indication
RM cells returned to sender by receiver with bits intact
Transport Layer
3-32Case study ATM ABR congestion control
two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path
EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in
returned RM cell
RM cell data cell
Transport Layer
3-33
TCP congestion control
Transport Layer
3-34
TCP congestion control additive increase multiplicative decrease approach sender increases transmission
rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1
MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half
after loss
cwnd
TCP
sen
der
cong
estio
n w
indo
w s
ize
AIMD saw toothbehavior probing
for bandwidth
additively increase window size helliphellip until loss occurs (then cut window in half)
time
Transport Layer
3-35TCP Congestion Control details
sender limits transmission
cwnd is dynamic function of perceived network congestion
TCP sending rate
roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte
ACKed sent not-yet ACKed(ldquoin-flightrdquo)
last byte sent
cwnd
LastByteSent-LastByteAcked
lt cwnd
sender sequence number space
rate~~cwndRTT bytessec
Transport Layer
3-36TCP Slow Start
when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for
every ACK received
summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
Transport Layer
3-37
TCP detecting reacting to loss
loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then
grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly
TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)
Transport Layer
3-38
Q when should the exponential increase switch to linear
A when cwnd gets to 12 of its value before timeout
Implementationvariable ssthresh on loss event ssthresh is
set to 12 of cwnd just before loss event
TCP switching from slow start to CA
Transport Layer
3-39
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer
3-40TCP throughput
avg TCP thruput as function of window size RTTignore slow start assume always data to
sendW window size (measured in bytes) where loss
occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW
W2
avg TCP thruput = 34WRTTbytessec
Transport Layer
3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want
10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L
[Mathis 1997]
to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate
new versions of TCP for high-speed
TCP throughput = 122 MSSRTT L
Transport Layer
3-42
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP Fairness
TCP connection 2
Transport Layer
3-43
Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout
increasesmultiplicative decrease decreases throughput
proportionally R
R
equal bandwidth share
Connection 1 throughputCon
nec t
ion
2 t h
roug
hput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer
3-44
Fairness (more)Fairness and UDP
multimedia apps often do not use TCP do not want rate throttled
by congestion control
instead use UDP send audiovideo at
constant rate tolerate packet loss
Fairness parallel TCP connections
application can open multiple parallel connections between two hosts
web browsers do this eg link of rate R with 9
existing connectionsnew app asks for 1 TCP gets
rate R10new app asks for 11 TCPs
gets R2
- Congestion in networks
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (2)
- TCP ACK generation [RFC 1122 RFC 2581]
- TCP fast retransmit
- TCP fast retransmit (2)
- TCP flow control
- TCP flow control (2)
- Connection Management
- Agreeing to establish a connection
- Agreeing to establish a connection (2)
- TCP 3-way handshake
- TCP 3-way handshake FSM
- TCP closing a connection
- TCP closing a connection (2)
- Principles of congestion control
- Principles of congestion control (2)
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 2 (3)
- Causescosts of congestion scenario 2 (4)
- Causescosts of congestion scenario 2 (5)
- Causescosts of congestion scenario 2 (6)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- Case study ATM ABR congestion control
- Case study ATM ABR congestion control (2)
- TCP congestion control
- TCP congestion control additive increase multiplicative decre
- TCP Congestion Control details
- TCP Slow Start
- TCP detecting reacting to loss
- TCP switching from slow start to CA
- Summary TCP Congestion Control
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- TCP Fairness
- Why is TCP fair
- Fairness (more)
-
Transport Layer
3-21
Causescosts of congestion scenario 1 two senders two receiversone router infinite buffers output link capacity Rno retransmission
maximum per-connection throughput R2
unlimited shared output link buffers
Host A
original data lin
Host B
throughput lout
R2
R2
l out
lin R2de
lay
lin large delays as arrival
rate lin approaches capacity
Transport Layer
3-22
one router finite buffers sender retransmission of timed-out packet application-layer input = application-layer output lin = lout
transport-layer input includes retransmissions lin lin
finite shared output link buffers
Host A
lin original data
Host B
loutlin original data plus retransmitted data
Causescosts of congestion scenario 2
Transport Layer
3-23
idealization perfect knowledge
sender sends only when router buffers available
finite shared output link buffers
lin original dataloutlin original data plus
retransmitted data
copy
free buffer space
R2
R2
l out
lin
Causescosts of congestion scenario 2
Host B
A
Transport Layer
3-24
lin original dataloutlin original data plus
retransmitted data
copy
no buffer space
Idealization known loss packets can be lost dropped at router due to full buffers
sender only resends if packet known to be lost
Causescosts of congestion scenario 2
A
Host B
Transport Layer
3-25
lin original dataloutlin original data plus
retransmitted data
free buffer space
Causescosts of congestion scenario 2
Idealization known loss packets can be lost dropped at router due to full buffers
sender only resends if packet known to be lost
R2
R2lin
l out
when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)
A
Host B
Transport Layer
3-26
A
lin loutlincopy
free buffer space
timeout
R2
R2lin
l out
when sending at R2 some packets are retransmissions including duplicated that are delivered
Host B
Realistic duplicates packets can be lost
dropped at router due to full buffers
sender times out prematurely sending two copies both of which are delivered
Causescosts of congestion scenario 2
Transport Layer
3-27
R2
l out
when sending at R2 some packets are retransmissions including duplicated that are delivered
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple
copies of pkt decreasing goodput
R2lin
Causescosts of congestion scenario 2 Realistic duplicates packets can be lost
dropped at router due to full buffers
sender times out prematurely sending two copies both of which are delivered
Transport Layer
3-28
four sendersmultihop paths timeoutretransmit
Q what happens as lin and lin
rsquo increase
finite shared output link buffers
Host A lout
Causescosts of congestion scenario 3
Host B
Host CHost D
lin original data
lin original data plus retransmitted data
A as red linrsquo increases all
arriving blue pkts at upper queue are dropped blue throughput g 0
Transport Layer
3-29
another ldquocostrdquo of congestion when packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Causescosts of congestion scenario 3
C2
C2
l out
linrsquo
Transport Layer
3-30
Approaches towards congestion controltwo broad approaches towards congestion
controlend-end congestion
controlno explicit
feedback from network
congestion inferred from end-system observed loss delay
approach taken by TCP
network-assisted congestion control
routers provide feedback to end systemssingle bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
explicit rate for sender to send at
Transport Layer
3-31Case study ATM ABR congestion
controlABR available bit rateldquoelastic servicerdquo if senderrsquos path
ldquounderloadedrdquo sender should use
available bandwidth
if senderrsquos path congested sender throttled to
minimum guaranteed rate
RM (resource management) cellssent by sender interspersed
with data cellsbits in RM cell set by switches
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild
congestion) CI bit congestion indication
RM cells returned to sender by receiver with bits intact
Transport Layer
3-32Case study ATM ABR congestion control
two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path
EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in
returned RM cell
RM cell data cell
Transport Layer
3-33
TCP congestion control
Transport Layer
3-34
TCP congestion control additive increase multiplicative decrease approach sender increases transmission
rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1
MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half
after loss
cwnd
TCP
sen
der
cong
estio
n w
indo
w s
ize
AIMD saw toothbehavior probing
for bandwidth
additively increase window size helliphellip until loss occurs (then cut window in half)
time
Transport Layer
3-35TCP Congestion Control details
sender limits transmission
cwnd is dynamic function of perceived network congestion
TCP sending rate
roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte
ACKed sent not-yet ACKed(ldquoin-flightrdquo)
last byte sent
cwnd
LastByteSent-LastByteAcked
lt cwnd
sender sequence number space
rate~~cwndRTT bytessec
Transport Layer
3-36TCP Slow Start
when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for
every ACK received
summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
Transport Layer
3-37
TCP detecting reacting to loss
loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then
grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly
TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)
Transport Layer
3-38
Q when should the exponential increase switch to linear
A when cwnd gets to 12 of its value before timeout
Implementationvariable ssthresh on loss event ssthresh is
set to 12 of cwnd just before loss event
TCP switching from slow start to CA
Transport Layer
3-39
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer
3-40TCP throughput
avg TCP thruput as function of window size RTTignore slow start assume always data to
sendW window size (measured in bytes) where loss
occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW
W2
avg TCP thruput = 34WRTTbytessec
Transport Layer
3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want
10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L
[Mathis 1997]
to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate
new versions of TCP for high-speed
TCP throughput = 122 MSSRTT L
Transport Layer
3-42
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP Fairness
TCP connection 2
Transport Layer
3-43
Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout
increasesmultiplicative decrease decreases throughput
proportionally R
R
equal bandwidth share
Connection 1 throughputCon
nec t
ion
2 t h
roug
hput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer
3-44
Fairness (more)Fairness and UDP
multimedia apps often do not use TCP do not want rate throttled
by congestion control
instead use UDP send audiovideo at
constant rate tolerate packet loss
Fairness parallel TCP connections
application can open multiple parallel connections between two hosts
web browsers do this eg link of rate R with 9
existing connectionsnew app asks for 1 TCP gets
rate R10new app asks for 11 TCPs
gets R2
- Congestion in networks
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (2)
- TCP ACK generation [RFC 1122 RFC 2581]
- TCP fast retransmit
- TCP fast retransmit (2)
- TCP flow control
- TCP flow control (2)
- Connection Management
- Agreeing to establish a connection
- Agreeing to establish a connection (2)
- TCP 3-way handshake
- TCP 3-way handshake FSM
- TCP closing a connection
- TCP closing a connection (2)
- Principles of congestion control
- Principles of congestion control (2)
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 2 (3)
- Causescosts of congestion scenario 2 (4)
- Causescosts of congestion scenario 2 (5)
- Causescosts of congestion scenario 2 (6)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- Case study ATM ABR congestion control
- Case study ATM ABR congestion control (2)
- TCP congestion control
- TCP congestion control additive increase multiplicative decre
- TCP Congestion Control details
- TCP Slow Start
- TCP detecting reacting to loss
- TCP switching from slow start to CA
- Summary TCP Congestion Control
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- TCP Fairness
- Why is TCP fair
- Fairness (more)
-
Transport Layer
3-22
one router finite buffers sender retransmission of timed-out packet application-layer input = application-layer output lin = lout
transport-layer input includes retransmissions lin lin
finite shared output link buffers
Host A
lin original data
Host B
loutlin original data plus retransmitted data
Causescosts of congestion scenario 2
Transport Layer
3-23
idealization perfect knowledge
sender sends only when router buffers available
finite shared output link buffers
lin original dataloutlin original data plus
retransmitted data
copy
free buffer space
R2
R2
l out
lin
Causescosts of congestion scenario 2
Host B
A
Transport Layer
3-24
lin original dataloutlin original data plus
retransmitted data
copy
no buffer space
Idealization known loss packets can be lost dropped at router due to full buffers
sender only resends if packet known to be lost
Causescosts of congestion scenario 2
A
Host B
Transport Layer
3-25
lin original dataloutlin original data plus
retransmitted data
free buffer space
Causescosts of congestion scenario 2
Idealization known loss packets can be lost dropped at router due to full buffers
sender only resends if packet known to be lost
R2
R2lin
l out
when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)
A
Host B
Transport Layer
3-26
A
lin loutlincopy
free buffer space
timeout
R2
R2lin
l out
when sending at R2 some packets are retransmissions including duplicated that are delivered
Host B
Realistic duplicates packets can be lost
dropped at router due to full buffers
sender times out prematurely sending two copies both of which are delivered
Causescosts of congestion scenario 2
Transport Layer
3-27
R2
l out
when sending at R2 some packets are retransmissions including duplicated that are delivered
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple
copies of pkt decreasing goodput
R2lin
Causescosts of congestion scenario 2 Realistic duplicates packets can be lost
dropped at router due to full buffers
sender times out prematurely sending two copies both of which are delivered
Transport Layer
3-28
four sendersmultihop paths timeoutretransmit
Q what happens as lin and lin
rsquo increase
finite shared output link buffers
Host A lout
Causescosts of congestion scenario 3
Host B
Host CHost D
lin original data
lin original data plus retransmitted data
A as red linrsquo increases all
arriving blue pkts at upper queue are dropped blue throughput g 0
Transport Layer
3-29
another ldquocostrdquo of congestion when packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Causescosts of congestion scenario 3
C2
C2
l out
linrsquo
Transport Layer
3-30
Approaches towards congestion controltwo broad approaches towards congestion
controlend-end congestion
controlno explicit
feedback from network
congestion inferred from end-system observed loss delay
approach taken by TCP
network-assisted congestion control
routers provide feedback to end systemssingle bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
explicit rate for sender to send at
Transport Layer
3-31Case study ATM ABR congestion
controlABR available bit rateldquoelastic servicerdquo if senderrsquos path
ldquounderloadedrdquo sender should use
available bandwidth
if senderrsquos path congested sender throttled to
minimum guaranteed rate
RM (resource management) cellssent by sender interspersed
with data cellsbits in RM cell set by switches
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild
congestion) CI bit congestion indication
RM cells returned to sender by receiver with bits intact
Transport Layer
3-32Case study ATM ABR congestion control
two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path
EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in
returned RM cell
RM cell data cell
Transport Layer
3-33
TCP congestion control
Transport Layer
3-34
TCP congestion control additive increase multiplicative decrease approach sender increases transmission
rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1
MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half
after loss
cwnd
TCP
sen
der
cong
estio
n w
indo
w s
ize
AIMD saw toothbehavior probing
for bandwidth
additively increase window size helliphellip until loss occurs (then cut window in half)
time
Transport Layer
3-35TCP Congestion Control details
sender limits transmission
cwnd is dynamic function of perceived network congestion
TCP sending rate
roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte
ACKed sent not-yet ACKed(ldquoin-flightrdquo)
last byte sent
cwnd
LastByteSent-LastByteAcked
lt cwnd
sender sequence number space
rate~~cwndRTT bytessec
Transport Layer
3-36TCP Slow Start
when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for
every ACK received
summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
Transport Layer
3-37
TCP detecting reacting to loss
loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then
grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly
TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)
Transport Layer
3-38
Q when should the exponential increase switch to linear
A when cwnd gets to 12 of its value before timeout
Implementationvariable ssthresh on loss event ssthresh is
set to 12 of cwnd just before loss event
TCP switching from slow start to CA
Transport Layer
3-39
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer
3-40TCP throughput
avg TCP thruput as function of window size RTTignore slow start assume always data to
sendW window size (measured in bytes) where loss
occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW
W2
avg TCP thruput = 34WRTTbytessec
Transport Layer
3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want
10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L
[Mathis 1997]
to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate
new versions of TCP for high-speed
TCP throughput = 122 MSSRTT L
Transport Layer
3-42
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP Fairness
TCP connection 2
Transport Layer
3-43
Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout
increasesmultiplicative decrease decreases throughput
proportionally R
R
equal bandwidth share
Connection 1 throughputCon
nec t
ion
2 t h
roug
hput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer
3-44
Fairness (more)Fairness and UDP
multimedia apps often do not use TCP do not want rate throttled
by congestion control
instead use UDP send audiovideo at
constant rate tolerate packet loss
Fairness parallel TCP connections
application can open multiple parallel connections between two hosts
web browsers do this eg link of rate R with 9
existing connectionsnew app asks for 1 TCP gets
rate R10new app asks for 11 TCPs
gets R2
- Congestion in networks
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (2)
- TCP ACK generation [RFC 1122 RFC 2581]
- TCP fast retransmit
- TCP fast retransmit (2)
- TCP flow control
- TCP flow control (2)
- Connection Management
- Agreeing to establish a connection
- Agreeing to establish a connection (2)
- TCP 3-way handshake
- TCP 3-way handshake FSM
- TCP closing a connection
- TCP closing a connection (2)
- Principles of congestion control
- Principles of congestion control (2)
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 2 (3)
- Causescosts of congestion scenario 2 (4)
- Causescosts of congestion scenario 2 (5)
- Causescosts of congestion scenario 2 (6)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- Case study ATM ABR congestion control
- Case study ATM ABR congestion control (2)
- TCP congestion control
- TCP congestion control additive increase multiplicative decre
- TCP Congestion Control details
- TCP Slow Start
- TCP detecting reacting to loss
- TCP switching from slow start to CA
- Summary TCP Congestion Control
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- TCP Fairness
- Why is TCP fair
- Fairness (more)
-
Transport Layer
3-23
idealization perfect knowledge
sender sends only when router buffers available
finite shared output link buffers
lin original dataloutlin original data plus
retransmitted data
copy
free buffer space
R2
R2
l out
lin
Causescosts of congestion scenario 2
Host B
A
Transport Layer
3-24
lin original dataloutlin original data plus
retransmitted data
copy
no buffer space
Idealization known loss packets can be lost dropped at router due to full buffers
sender only resends if packet known to be lost
Causescosts of congestion scenario 2
A
Host B
Transport Layer
3-25
lin original dataloutlin original data plus
retransmitted data
free buffer space
Causescosts of congestion scenario 2
Idealization known loss packets can be lost dropped at router due to full buffers
sender only resends if packet known to be lost
R2
R2lin
l out
when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)
A
Host B
Transport Layer
3-26
A
lin loutlincopy
free buffer space
timeout
R2
R2lin
l out
when sending at R2 some packets are retransmissions including duplicated that are delivered
Host B
Realistic duplicates packets can be lost
dropped at router due to full buffers
sender times out prematurely sending two copies both of which are delivered
Causescosts of congestion scenario 2
Transport Layer
3-27
R2
l out
when sending at R2 some packets are retransmissions including duplicated that are delivered
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple
copies of pkt decreasing goodput
R2lin
Causescosts of congestion scenario 2 Realistic duplicates packets can be lost
dropped at router due to full buffers
sender times out prematurely sending two copies both of which are delivered
Transport Layer
3-28
four sendersmultihop paths timeoutretransmit
Q what happens as lin and lin
rsquo increase
finite shared output link buffers
Host A lout
Causescosts of congestion scenario 3
Host B
Host CHost D
lin original data
lin original data plus retransmitted data
A as red linrsquo increases all
arriving blue pkts at upper queue are dropped blue throughput g 0
Transport Layer
3-29
another ldquocostrdquo of congestion when packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Causescosts of congestion scenario 3
C2
C2
l out
linrsquo
Transport Layer
3-30
Approaches towards congestion controltwo broad approaches towards congestion
controlend-end congestion
controlno explicit
feedback from network
congestion inferred from end-system observed loss delay
approach taken by TCP
network-assisted congestion control
routers provide feedback to end systemssingle bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
explicit rate for sender to send at
Transport Layer
3-31Case study ATM ABR congestion
controlABR available bit rateldquoelastic servicerdquo if senderrsquos path
ldquounderloadedrdquo sender should use
available bandwidth
if senderrsquos path congested sender throttled to
minimum guaranteed rate
RM (resource management) cellssent by sender interspersed
with data cellsbits in RM cell set by switches
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild
congestion) CI bit congestion indication
RM cells returned to sender by receiver with bits intact
Transport Layer
3-32Case study ATM ABR congestion control
two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path
EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in
returned RM cell
RM cell data cell
Transport Layer
3-33
TCP congestion control
Transport Layer
3-34
TCP congestion control additive increase multiplicative decrease approach sender increases transmission
rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1
MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half
after loss
cwnd
TCP
sen
der
cong
estio
n w
indo
w s
ize
AIMD saw toothbehavior probing
for bandwidth
additively increase window size helliphellip until loss occurs (then cut window in half)
time
Transport Layer
3-35TCP Congestion Control details
sender limits transmission
cwnd is dynamic function of perceived network congestion
TCP sending rate
roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte
ACKed sent not-yet ACKed(ldquoin-flightrdquo)
last byte sent
cwnd
LastByteSent-LastByteAcked
lt cwnd
sender sequence number space
rate~~cwndRTT bytessec
Transport Layer
3-36TCP Slow Start
when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for
every ACK received
summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
Transport Layer
3-37
TCP detecting reacting to loss
loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then
grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly
TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)
Transport Layer
3-38
Q when should the exponential increase switch to linear
A when cwnd gets to 12 of its value before timeout
Implementationvariable ssthresh on loss event ssthresh is
set to 12 of cwnd just before loss event
TCP switching from slow start to CA
Transport Layer
3-39
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer
3-40TCP throughput
avg TCP thruput as function of window size RTTignore slow start assume always data to
sendW window size (measured in bytes) where loss
occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW
W2
avg TCP thruput = 34WRTTbytessec
Transport Layer
3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want
10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L
[Mathis 1997]
to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate
new versions of TCP for high-speed
TCP throughput = 122 MSSRTT L
Transport Layer
3-42
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP Fairness
TCP connection 2
Transport Layer
3-43
Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout
increasesmultiplicative decrease decreases throughput
proportionally R
R
equal bandwidth share
Connection 1 throughputCon
nec t
ion
2 t h
roug
hput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer
3-44
Fairness (more)Fairness and UDP
multimedia apps often do not use TCP do not want rate throttled
by congestion control
instead use UDP send audiovideo at
constant rate tolerate packet loss
Fairness parallel TCP connections
application can open multiple parallel connections between two hosts
web browsers do this eg link of rate R with 9
existing connectionsnew app asks for 1 TCP gets
rate R10new app asks for 11 TCPs
gets R2
- Congestion in networks
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (2)
- TCP ACK generation [RFC 1122 RFC 2581]
- TCP fast retransmit
- TCP fast retransmit (2)
- TCP flow control
- TCP flow control (2)
- Connection Management
- Agreeing to establish a connection
- Agreeing to establish a connection (2)
- TCP 3-way handshake
- TCP 3-way handshake FSM
- TCP closing a connection
- TCP closing a connection (2)
- Principles of congestion control
- Principles of congestion control (2)
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 2 (3)
- Causescosts of congestion scenario 2 (4)
- Causescosts of congestion scenario 2 (5)
- Causescosts of congestion scenario 2 (6)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- Case study ATM ABR congestion control
- Case study ATM ABR congestion control (2)
- TCP congestion control
- TCP congestion control additive increase multiplicative decre
- TCP Congestion Control details
- TCP Slow Start
- TCP detecting reacting to loss
- TCP switching from slow start to CA
- Summary TCP Congestion Control
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- TCP Fairness
- Why is TCP fair
- Fairness (more)
-
Transport Layer
3-24
lin original dataloutlin original data plus
retransmitted data
copy
no buffer space
Idealization known loss packets can be lost dropped at router due to full buffers
sender only resends if packet known to be lost
Causescosts of congestion scenario 2
A
Host B
Transport Layer
3-25
lin original dataloutlin original data plus
retransmitted data
free buffer space
Causescosts of congestion scenario 2
Idealization known loss packets can be lost dropped at router due to full buffers
sender only resends if packet known to be lost
R2
R2lin
l out
when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)
A
Host B
Transport Layer
3-26
A
lin loutlincopy
free buffer space
timeout
R2
R2lin
l out
when sending at R2 some packets are retransmissions including duplicated that are delivered
Host B
Realistic duplicates packets can be lost
dropped at router due to full buffers
sender times out prematurely sending two copies both of which are delivered
Causescosts of congestion scenario 2
Transport Layer
3-27
R2
l out
when sending at R2 some packets are retransmissions including duplicated that are delivered
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple
copies of pkt decreasing goodput
R2lin
Causescosts of congestion scenario 2 Realistic duplicates packets can be lost
dropped at router due to full buffers
sender times out prematurely sending two copies both of which are delivered
Transport Layer
3-28
four sendersmultihop paths timeoutretransmit
Q what happens as lin and lin
rsquo increase
finite shared output link buffers
Host A lout
Causescosts of congestion scenario 3
Host B
Host CHost D
lin original data
lin original data plus retransmitted data
A as red linrsquo increases all
arriving blue pkts at upper queue are dropped blue throughput g 0
Transport Layer
3-29
another ldquocostrdquo of congestion when packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Causescosts of congestion scenario 3
C2
C2
l out
linrsquo
Transport Layer
3-30
Approaches towards congestion controltwo broad approaches towards congestion
controlend-end congestion
controlno explicit
feedback from network
congestion inferred from end-system observed loss delay
approach taken by TCP
network-assisted congestion control
routers provide feedback to end systemssingle bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
explicit rate for sender to send at
Transport Layer
3-31Case study ATM ABR congestion
controlABR available bit rateldquoelastic servicerdquo if senderrsquos path
ldquounderloadedrdquo sender should use
available bandwidth
if senderrsquos path congested sender throttled to
minimum guaranteed rate
RM (resource management) cellssent by sender interspersed
with data cellsbits in RM cell set by switches
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild
congestion) CI bit congestion indication
RM cells returned to sender by receiver with bits intact
Transport Layer
3-32Case study ATM ABR congestion control
two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path
EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in
returned RM cell
RM cell data cell
Transport Layer
3-33
TCP congestion control
Transport Layer
3-34
TCP congestion control additive increase multiplicative decrease approach sender increases transmission
rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1
MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half
after loss
cwnd
TCP
sen
der
cong
estio
n w
indo
w s
ize
AIMD saw toothbehavior probing
for bandwidth
additively increase window size helliphellip until loss occurs (then cut window in half)
time
Transport Layer
3-35TCP Congestion Control details
sender limits transmission
cwnd is dynamic function of perceived network congestion
TCP sending rate
roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte
ACKed sent not-yet ACKed(ldquoin-flightrdquo)
last byte sent
cwnd
LastByteSent-LastByteAcked
lt cwnd
sender sequence number space
rate~~cwndRTT bytessec
Transport Layer
3-36TCP Slow Start
when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for
every ACK received
summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
Transport Layer
3-37
TCP detecting reacting to loss
loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then
grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly
TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)
Transport Layer
3-38
Q when should the exponential increase switch to linear
A when cwnd gets to 12 of its value before timeout
Implementationvariable ssthresh on loss event ssthresh is
set to 12 of cwnd just before loss event
TCP switching from slow start to CA
Transport Layer
3-39
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer
3-40TCP throughput
avg TCP thruput as function of window size RTTignore slow start assume always data to
sendW window size (measured in bytes) where loss
occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW
W2
avg TCP thruput = 34WRTTbytessec
Transport Layer
3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want
10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L
[Mathis 1997]
to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate
new versions of TCP for high-speed
TCP throughput = 122 MSSRTT L
Transport Layer
3-42
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP Fairness
TCP connection 2
Transport Layer
3-43
Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout
increasesmultiplicative decrease decreases throughput
proportionally R
R
equal bandwidth share
Connection 1 throughputCon
nec t
ion
2 t h
roug
hput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer
3-44
Fairness (more)Fairness and UDP
multimedia apps often do not use TCP do not want rate throttled
by congestion control
instead use UDP send audiovideo at
constant rate tolerate packet loss
Fairness parallel TCP connections
application can open multiple parallel connections between two hosts
web browsers do this eg link of rate R with 9
existing connectionsnew app asks for 1 TCP gets
rate R10new app asks for 11 TCPs
gets R2
- Congestion in networks
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (2)
- TCP ACK generation [RFC 1122 RFC 2581]
- TCP fast retransmit
- TCP fast retransmit (2)
- TCP flow control
- TCP flow control (2)
- Connection Management
- Agreeing to establish a connection
- Agreeing to establish a connection (2)
- TCP 3-way handshake
- TCP 3-way handshake FSM
- TCP closing a connection
- TCP closing a connection (2)
- Principles of congestion control
- Principles of congestion control (2)
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 2 (3)
- Causescosts of congestion scenario 2 (4)
- Causescosts of congestion scenario 2 (5)
- Causescosts of congestion scenario 2 (6)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- Case study ATM ABR congestion control
- Case study ATM ABR congestion control (2)
- TCP congestion control
- TCP congestion control additive increase multiplicative decre
- TCP Congestion Control details
- TCP Slow Start
- TCP detecting reacting to loss
- TCP switching from slow start to CA
- Summary TCP Congestion Control
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- TCP Fairness
- Why is TCP fair
- Fairness (more)
-
Transport Layer
3-25
lin original dataloutlin original data plus
retransmitted data
free buffer space
Causescosts of congestion scenario 2
Idealization known loss packets can be lost dropped at router due to full buffers
sender only resends if packet known to be lost
R2
R2lin
l out
when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)
A
Host B
Transport Layer
3-26
A
lin loutlincopy
free buffer space
timeout
R2
R2lin
l out
when sending at R2 some packets are retransmissions including duplicated that are delivered
Host B
Realistic duplicates packets can be lost
dropped at router due to full buffers
sender times out prematurely sending two copies both of which are delivered
Causescosts of congestion scenario 2
Transport Layer
3-27
R2
l out
when sending at R2 some packets are retransmissions including duplicated that are delivered
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple
copies of pkt decreasing goodput
R2lin
Causescosts of congestion scenario 2 Realistic duplicates packets can be lost
dropped at router due to full buffers
sender times out prematurely sending two copies both of which are delivered
Transport Layer
3-28
four sendersmultihop paths timeoutretransmit
Q what happens as lin and lin
rsquo increase
finite shared output link buffers
Host A lout
Causescosts of congestion scenario 3
Host B
Host CHost D
lin original data
lin original data plus retransmitted data
A as red linrsquo increases all
arriving blue pkts at upper queue are dropped blue throughput g 0
Transport Layer
3-29
another ldquocostrdquo of congestion when packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Causescosts of congestion scenario 3
C2
C2
l out
linrsquo
Transport Layer
3-30
Approaches towards congestion controltwo broad approaches towards congestion
controlend-end congestion
controlno explicit
feedback from network
congestion inferred from end-system observed loss delay
approach taken by TCP
network-assisted congestion control
routers provide feedback to end systemssingle bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
explicit rate for sender to send at
Transport Layer
3-31Case study ATM ABR congestion
controlABR available bit rateldquoelastic servicerdquo if senderrsquos path
ldquounderloadedrdquo sender should use
available bandwidth
if senderrsquos path congested sender throttled to
minimum guaranteed rate
RM (resource management) cellssent by sender interspersed
with data cellsbits in RM cell set by switches
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild
congestion) CI bit congestion indication
RM cells returned to sender by receiver with bits intact
Transport Layer
3-32Case study ATM ABR congestion control
two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path
EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in
returned RM cell
RM cell data cell
Transport Layer
3-33
TCP congestion control
Transport Layer
3-34
TCP congestion control additive increase multiplicative decrease approach sender increases transmission
rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1
MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half
after loss
cwnd
TCP
sen
der
cong
estio
n w
indo
w s
ize
AIMD saw toothbehavior probing
for bandwidth
additively increase window size helliphellip until loss occurs (then cut window in half)
time
Transport Layer
3-35TCP Congestion Control details
sender limits transmission
cwnd is dynamic function of perceived network congestion
TCP sending rate
roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte
ACKed sent not-yet ACKed(ldquoin-flightrdquo)
last byte sent
cwnd
LastByteSent-LastByteAcked
lt cwnd
sender sequence number space
rate~~cwndRTT bytessec
Transport Layer
3-36TCP Slow Start
when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for
every ACK received
summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
Transport Layer
3-37
TCP detecting reacting to loss
loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then
grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly
TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)
Transport Layer
3-38
Q when should the exponential increase switch to linear
A when cwnd gets to 12 of its value before timeout
Implementationvariable ssthresh on loss event ssthresh is
set to 12 of cwnd just before loss event
TCP switching from slow start to CA
Transport Layer
3-39
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer
3-40TCP throughput
avg TCP thruput as function of window size RTTignore slow start assume always data to
sendW window size (measured in bytes) where loss
occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW
W2
avg TCP thruput = 34WRTTbytessec
Transport Layer
3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want
10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L
[Mathis 1997]
to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate
new versions of TCP for high-speed
TCP throughput = 122 MSSRTT L
Transport Layer
3-42
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP Fairness
TCP connection 2
Transport Layer
3-43
Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout
increasesmultiplicative decrease decreases throughput
proportionally R
R
equal bandwidth share
Connection 1 throughputCon
nec t
ion
2 t h
roug
hput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer
3-44
Fairness (more)Fairness and UDP
multimedia apps often do not use TCP do not want rate throttled
by congestion control
instead use UDP send audiovideo at
constant rate tolerate packet loss
Fairness parallel TCP connections
application can open multiple parallel connections between two hosts
web browsers do this eg link of rate R with 9
existing connectionsnew app asks for 1 TCP gets
rate R10new app asks for 11 TCPs
gets R2
- Congestion in networks
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (2)
- TCP ACK generation [RFC 1122 RFC 2581]
- TCP fast retransmit
- TCP fast retransmit (2)
- TCP flow control
- TCP flow control (2)
- Connection Management
- Agreeing to establish a connection
- Agreeing to establish a connection (2)
- TCP 3-way handshake
- TCP 3-way handshake FSM
- TCP closing a connection
- TCP closing a connection (2)
- Principles of congestion control
- Principles of congestion control (2)
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 2 (3)
- Causescosts of congestion scenario 2 (4)
- Causescosts of congestion scenario 2 (5)
- Causescosts of congestion scenario 2 (6)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- Case study ATM ABR congestion control
- Case study ATM ABR congestion control (2)
- TCP congestion control
- TCP congestion control additive increase multiplicative decre
- TCP Congestion Control details
- TCP Slow Start
- TCP detecting reacting to loss
- TCP switching from slow start to CA
- Summary TCP Congestion Control
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- TCP Fairness
- Why is TCP fair
- Fairness (more)
-
Transport Layer
3-26
A
lin loutlincopy
free buffer space
timeout
R2
R2lin
l out
when sending at R2 some packets are retransmissions including duplicated that are delivered
Host B
Realistic duplicates packets can be lost
dropped at router due to full buffers
sender times out prematurely sending two copies both of which are delivered
Causescosts of congestion scenario 2
Transport Layer
3-27
R2
l out
when sending at R2 some packets are retransmissions including duplicated that are delivered
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple
copies of pkt decreasing goodput
R2lin
Causescosts of congestion scenario 2 Realistic duplicates packets can be lost
dropped at router due to full buffers
sender times out prematurely sending two copies both of which are delivered
Transport Layer
3-28
four sendersmultihop paths timeoutretransmit
Q what happens as lin and lin
rsquo increase
finite shared output link buffers
Host A lout
Causescosts of congestion scenario 3
Host B
Host CHost D
lin original data
lin original data plus retransmitted data
A as red linrsquo increases all
arriving blue pkts at upper queue are dropped blue throughput g 0
Transport Layer
3-29
another ldquocostrdquo of congestion when packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Causescosts of congestion scenario 3
C2
C2
l out
linrsquo
Transport Layer
3-30
Approaches towards congestion controltwo broad approaches towards congestion
controlend-end congestion
controlno explicit
feedback from network
congestion inferred from end-system observed loss delay
approach taken by TCP
network-assisted congestion control
routers provide feedback to end systemssingle bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
explicit rate for sender to send at
Transport Layer
3-31Case study ATM ABR congestion
controlABR available bit rateldquoelastic servicerdquo if senderrsquos path
ldquounderloadedrdquo sender should use
available bandwidth
if senderrsquos path congested sender throttled to
minimum guaranteed rate
RM (resource management) cellssent by sender interspersed
with data cellsbits in RM cell set by switches
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild
congestion) CI bit congestion indication
RM cells returned to sender by receiver with bits intact
Transport Layer
3-32Case study ATM ABR congestion control
two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path
EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in
returned RM cell
RM cell data cell
Transport Layer
3-33
TCP congestion control
Transport Layer
3-34
TCP congestion control additive increase multiplicative decrease approach sender increases transmission
rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1
MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half
after loss
cwnd
TCP
sen
der
cong
estio
n w
indo
w s
ize
AIMD saw toothbehavior probing
for bandwidth
additively increase window size helliphellip until loss occurs (then cut window in half)
time
Transport Layer
3-35TCP Congestion Control details
sender limits transmission
cwnd is dynamic function of perceived network congestion
TCP sending rate
roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte
ACKed sent not-yet ACKed(ldquoin-flightrdquo)
last byte sent
cwnd
LastByteSent-LastByteAcked
lt cwnd
sender sequence number space
rate~~cwndRTT bytessec
Transport Layer
3-36TCP Slow Start
when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for
every ACK received
summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
Transport Layer
3-37
TCP detecting reacting to loss
loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then
grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly
TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)
Transport Layer
3-38
Q when should the exponential increase switch to linear
A when cwnd gets to 12 of its value before timeout
Implementationvariable ssthresh on loss event ssthresh is
set to 12 of cwnd just before loss event
TCP switching from slow start to CA
Transport Layer
3-39
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer
3-40TCP throughput
avg TCP thruput as function of window size RTTignore slow start assume always data to
sendW window size (measured in bytes) where loss
occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW
W2
avg TCP thruput = 34WRTTbytessec
Transport Layer
3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want
10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L
[Mathis 1997]
to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate
new versions of TCP for high-speed
TCP throughput = 122 MSSRTT L
Transport Layer
3-42
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP Fairness
TCP connection 2
Transport Layer
3-43
Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout
increasesmultiplicative decrease decreases throughput
proportionally R
R
equal bandwidth share
Connection 1 throughputCon
nec t
ion
2 t h
roug
hput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer
3-44
Fairness (more)Fairness and UDP
multimedia apps often do not use TCP do not want rate throttled
by congestion control
instead use UDP send audiovideo at
constant rate tolerate packet loss
Fairness parallel TCP connections
application can open multiple parallel connections between two hosts
web browsers do this eg link of rate R with 9
existing connectionsnew app asks for 1 TCP gets
rate R10new app asks for 11 TCPs
gets R2
- Congestion in networks
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (2)
- TCP ACK generation [RFC 1122 RFC 2581]
- TCP fast retransmit
- TCP fast retransmit (2)
- TCP flow control
- TCP flow control (2)
- Connection Management
- Agreeing to establish a connection
- Agreeing to establish a connection (2)
- TCP 3-way handshake
- TCP 3-way handshake FSM
- TCP closing a connection
- TCP closing a connection (2)
- Principles of congestion control
- Principles of congestion control (2)
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 2 (3)
- Causescosts of congestion scenario 2 (4)
- Causescosts of congestion scenario 2 (5)
- Causescosts of congestion scenario 2 (6)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- Case study ATM ABR congestion control
- Case study ATM ABR congestion control (2)
- TCP congestion control
- TCP congestion control additive increase multiplicative decre
- TCP Congestion Control details
- TCP Slow Start
- TCP detecting reacting to loss
- TCP switching from slow start to CA
- Summary TCP Congestion Control
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- TCP Fairness
- Why is TCP fair
- Fairness (more)
-
Transport Layer
3-27
R2
l out
when sending at R2 some packets are retransmissions including duplicated that are delivered
ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple
copies of pkt decreasing goodput
R2lin
Causescosts of congestion scenario 2 Realistic duplicates packets can be lost
dropped at router due to full buffers
sender times out prematurely sending two copies both of which are delivered
Transport Layer
3-28
four sendersmultihop paths timeoutretransmit
Q what happens as lin and lin
rsquo increase
finite shared output link buffers
Host A lout
Causescosts of congestion scenario 3
Host B
Host CHost D
lin original data
lin original data plus retransmitted data
A as red linrsquo increases all
arriving blue pkts at upper queue are dropped blue throughput g 0
Transport Layer
3-29
another ldquocostrdquo of congestion when packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Causescosts of congestion scenario 3
C2
C2
l out
linrsquo
Transport Layer
3-30
Approaches towards congestion controltwo broad approaches towards congestion
controlend-end congestion
controlno explicit
feedback from network
congestion inferred from end-system observed loss delay
approach taken by TCP
network-assisted congestion control
routers provide feedback to end systemssingle bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
explicit rate for sender to send at
Transport Layer
3-31Case study ATM ABR congestion
controlABR available bit rateldquoelastic servicerdquo if senderrsquos path
ldquounderloadedrdquo sender should use
available bandwidth
if senderrsquos path congested sender throttled to
minimum guaranteed rate
RM (resource management) cellssent by sender interspersed
with data cellsbits in RM cell set by switches
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild
congestion) CI bit congestion indication
RM cells returned to sender by receiver with bits intact
Transport Layer
3-32Case study ATM ABR congestion control
two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path
EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in
returned RM cell
RM cell data cell
Transport Layer
3-33
TCP congestion control
Transport Layer
3-34
TCP congestion control additive increase multiplicative decrease approach sender increases transmission
rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1
MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half
after loss
cwnd
TCP
sen
der
cong
estio
n w
indo
w s
ize
AIMD saw toothbehavior probing
for bandwidth
additively increase window size helliphellip until loss occurs (then cut window in half)
time
Transport Layer
3-35TCP Congestion Control details
sender limits transmission
cwnd is dynamic function of perceived network congestion
TCP sending rate
roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte
ACKed sent not-yet ACKed(ldquoin-flightrdquo)
last byte sent
cwnd
LastByteSent-LastByteAcked
lt cwnd
sender sequence number space
rate~~cwndRTT bytessec
Transport Layer
3-36TCP Slow Start
when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for
every ACK received
summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
Transport Layer
3-37
TCP detecting reacting to loss
loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then
grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly
TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)
Transport Layer
3-38
Q when should the exponential increase switch to linear
A when cwnd gets to 12 of its value before timeout
Implementationvariable ssthresh on loss event ssthresh is
set to 12 of cwnd just before loss event
TCP switching from slow start to CA
Transport Layer
3-39
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer
3-40TCP throughput
avg TCP thruput as function of window size RTTignore slow start assume always data to
sendW window size (measured in bytes) where loss
occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW
W2
avg TCP thruput = 34WRTTbytessec
Transport Layer
3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want
10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L
[Mathis 1997]
to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate
new versions of TCP for high-speed
TCP throughput = 122 MSSRTT L
Transport Layer
3-42
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP Fairness
TCP connection 2
Transport Layer
3-43
Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout
increasesmultiplicative decrease decreases throughput
proportionally R
R
equal bandwidth share
Connection 1 throughputCon
nec t
ion
2 t h
roug
hput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer
3-44
Fairness (more)Fairness and UDP
multimedia apps often do not use TCP do not want rate throttled
by congestion control
instead use UDP send audiovideo at
constant rate tolerate packet loss
Fairness parallel TCP connections
application can open multiple parallel connections between two hosts
web browsers do this eg link of rate R with 9
existing connectionsnew app asks for 1 TCP gets
rate R10new app asks for 11 TCPs
gets R2
- Congestion in networks
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (2)
- TCP ACK generation [RFC 1122 RFC 2581]
- TCP fast retransmit
- TCP fast retransmit (2)
- TCP flow control
- TCP flow control (2)
- Connection Management
- Agreeing to establish a connection
- Agreeing to establish a connection (2)
- TCP 3-way handshake
- TCP 3-way handshake FSM
- TCP closing a connection
- TCP closing a connection (2)
- Principles of congestion control
- Principles of congestion control (2)
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 2 (3)
- Causescosts of congestion scenario 2 (4)
- Causescosts of congestion scenario 2 (5)
- Causescosts of congestion scenario 2 (6)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- Case study ATM ABR congestion control
- Case study ATM ABR congestion control (2)
- TCP congestion control
- TCP congestion control additive increase multiplicative decre
- TCP Congestion Control details
- TCP Slow Start
- TCP detecting reacting to loss
- TCP switching from slow start to CA
- Summary TCP Congestion Control
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- TCP Fairness
- Why is TCP fair
- Fairness (more)
-
Transport Layer
3-28
four sendersmultihop paths timeoutretransmit
Q what happens as lin and lin
rsquo increase
finite shared output link buffers
Host A lout
Causescosts of congestion scenario 3
Host B
Host CHost D
lin original data
lin original data plus retransmitted data
A as red linrsquo increases all
arriving blue pkts at upper queue are dropped blue throughput g 0
Transport Layer
3-29
another ldquocostrdquo of congestion when packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Causescosts of congestion scenario 3
C2
C2
l out
linrsquo
Transport Layer
3-30
Approaches towards congestion controltwo broad approaches towards congestion
controlend-end congestion
controlno explicit
feedback from network
congestion inferred from end-system observed loss delay
approach taken by TCP
network-assisted congestion control
routers provide feedback to end systemssingle bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
explicit rate for sender to send at
Transport Layer
3-31Case study ATM ABR congestion
controlABR available bit rateldquoelastic servicerdquo if senderrsquos path
ldquounderloadedrdquo sender should use
available bandwidth
if senderrsquos path congested sender throttled to
minimum guaranteed rate
RM (resource management) cellssent by sender interspersed
with data cellsbits in RM cell set by switches
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild
congestion) CI bit congestion indication
RM cells returned to sender by receiver with bits intact
Transport Layer
3-32Case study ATM ABR congestion control
two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path
EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in
returned RM cell
RM cell data cell
Transport Layer
3-33
TCP congestion control
Transport Layer
3-34
TCP congestion control additive increase multiplicative decrease approach sender increases transmission
rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1
MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half
after loss
cwnd
TCP
sen
der
cong
estio
n w
indo
w s
ize
AIMD saw toothbehavior probing
for bandwidth
additively increase window size helliphellip until loss occurs (then cut window in half)
time
Transport Layer
3-35TCP Congestion Control details
sender limits transmission
cwnd is dynamic function of perceived network congestion
TCP sending rate
roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte
ACKed sent not-yet ACKed(ldquoin-flightrdquo)
last byte sent
cwnd
LastByteSent-LastByteAcked
lt cwnd
sender sequence number space
rate~~cwndRTT bytessec
Transport Layer
3-36TCP Slow Start
when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for
every ACK received
summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
Transport Layer
3-37
TCP detecting reacting to loss
loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then
grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly
TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)
Transport Layer
3-38
Q when should the exponential increase switch to linear
A when cwnd gets to 12 of its value before timeout
Implementationvariable ssthresh on loss event ssthresh is
set to 12 of cwnd just before loss event
TCP switching from slow start to CA
Transport Layer
3-39
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer
3-40TCP throughput
avg TCP thruput as function of window size RTTignore slow start assume always data to
sendW window size (measured in bytes) where loss
occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW
W2
avg TCP thruput = 34WRTTbytessec
Transport Layer
3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want
10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L
[Mathis 1997]
to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate
new versions of TCP for high-speed
TCP throughput = 122 MSSRTT L
Transport Layer
3-42
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP Fairness
TCP connection 2
Transport Layer
3-43
Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout
increasesmultiplicative decrease decreases throughput
proportionally R
R
equal bandwidth share
Connection 1 throughputCon
nec t
ion
2 t h
roug
hput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer
3-44
Fairness (more)Fairness and UDP
multimedia apps often do not use TCP do not want rate throttled
by congestion control
instead use UDP send audiovideo at
constant rate tolerate packet loss
Fairness parallel TCP connections
application can open multiple parallel connections between two hosts
web browsers do this eg link of rate R with 9
existing connectionsnew app asks for 1 TCP gets
rate R10new app asks for 11 TCPs
gets R2
- Congestion in networks
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (2)
- TCP ACK generation [RFC 1122 RFC 2581]
- TCP fast retransmit
- TCP fast retransmit (2)
- TCP flow control
- TCP flow control (2)
- Connection Management
- Agreeing to establish a connection
- Agreeing to establish a connection (2)
- TCP 3-way handshake
- TCP 3-way handshake FSM
- TCP closing a connection
- TCP closing a connection (2)
- Principles of congestion control
- Principles of congestion control (2)
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 2 (3)
- Causescosts of congestion scenario 2 (4)
- Causescosts of congestion scenario 2 (5)
- Causescosts of congestion scenario 2 (6)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- Case study ATM ABR congestion control
- Case study ATM ABR congestion control (2)
- TCP congestion control
- TCP congestion control additive increase multiplicative decre
- TCP Congestion Control details
- TCP Slow Start
- TCP detecting reacting to loss
- TCP switching from slow start to CA
- Summary TCP Congestion Control
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- TCP Fairness
- Why is TCP fair
- Fairness (more)
-
Transport Layer
3-29
another ldquocostrdquo of congestion when packet dropped any ldquoupstream
transmission capacity used for that packet was wasted
Causescosts of congestion scenario 3
C2
C2
l out
linrsquo
Transport Layer
3-30
Approaches towards congestion controltwo broad approaches towards congestion
controlend-end congestion
controlno explicit
feedback from network
congestion inferred from end-system observed loss delay
approach taken by TCP
network-assisted congestion control
routers provide feedback to end systemssingle bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
explicit rate for sender to send at
Transport Layer
3-31Case study ATM ABR congestion
controlABR available bit rateldquoelastic servicerdquo if senderrsquos path
ldquounderloadedrdquo sender should use
available bandwidth
if senderrsquos path congested sender throttled to
minimum guaranteed rate
RM (resource management) cellssent by sender interspersed
with data cellsbits in RM cell set by switches
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild
congestion) CI bit congestion indication
RM cells returned to sender by receiver with bits intact
Transport Layer
3-32Case study ATM ABR congestion control
two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path
EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in
returned RM cell
RM cell data cell
Transport Layer
3-33
TCP congestion control
Transport Layer
3-34
TCP congestion control additive increase multiplicative decrease approach sender increases transmission
rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1
MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half
after loss
cwnd
TCP
sen
der
cong
estio
n w
indo
w s
ize
AIMD saw toothbehavior probing
for bandwidth
additively increase window size helliphellip until loss occurs (then cut window in half)
time
Transport Layer
3-35TCP Congestion Control details
sender limits transmission
cwnd is dynamic function of perceived network congestion
TCP sending rate
roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte
ACKed sent not-yet ACKed(ldquoin-flightrdquo)
last byte sent
cwnd
LastByteSent-LastByteAcked
lt cwnd
sender sequence number space
rate~~cwndRTT bytessec
Transport Layer
3-36TCP Slow Start
when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for
every ACK received
summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
Transport Layer
3-37
TCP detecting reacting to loss
loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then
grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly
TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)
Transport Layer
3-38
Q when should the exponential increase switch to linear
A when cwnd gets to 12 of its value before timeout
Implementationvariable ssthresh on loss event ssthresh is
set to 12 of cwnd just before loss event
TCP switching from slow start to CA
Transport Layer
3-39
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer
3-40TCP throughput
avg TCP thruput as function of window size RTTignore slow start assume always data to
sendW window size (measured in bytes) where loss
occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW
W2
avg TCP thruput = 34WRTTbytessec
Transport Layer
3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want
10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L
[Mathis 1997]
to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate
new versions of TCP for high-speed
TCP throughput = 122 MSSRTT L
Transport Layer
3-42
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP Fairness
TCP connection 2
Transport Layer
3-43
Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout
increasesmultiplicative decrease decreases throughput
proportionally R
R
equal bandwidth share
Connection 1 throughputCon
nec t
ion
2 t h
roug
hput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer
3-44
Fairness (more)Fairness and UDP
multimedia apps often do not use TCP do not want rate throttled
by congestion control
instead use UDP send audiovideo at
constant rate tolerate packet loss
Fairness parallel TCP connections
application can open multiple parallel connections between two hosts
web browsers do this eg link of rate R with 9
existing connectionsnew app asks for 1 TCP gets
rate R10new app asks for 11 TCPs
gets R2
- Congestion in networks
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (2)
- TCP ACK generation [RFC 1122 RFC 2581]
- TCP fast retransmit
- TCP fast retransmit (2)
- TCP flow control
- TCP flow control (2)
- Connection Management
- Agreeing to establish a connection
- Agreeing to establish a connection (2)
- TCP 3-way handshake
- TCP 3-way handshake FSM
- TCP closing a connection
- TCP closing a connection (2)
- Principles of congestion control
- Principles of congestion control (2)
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 2 (3)
- Causescosts of congestion scenario 2 (4)
- Causescosts of congestion scenario 2 (5)
- Causescosts of congestion scenario 2 (6)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- Case study ATM ABR congestion control
- Case study ATM ABR congestion control (2)
- TCP congestion control
- TCP congestion control additive increase multiplicative decre
- TCP Congestion Control details
- TCP Slow Start
- TCP detecting reacting to loss
- TCP switching from slow start to CA
- Summary TCP Congestion Control
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- TCP Fairness
- Why is TCP fair
- Fairness (more)
-
Transport Layer
3-30
Approaches towards congestion controltwo broad approaches towards congestion
controlend-end congestion
controlno explicit
feedback from network
congestion inferred from end-system observed loss delay
approach taken by TCP
network-assisted congestion control
routers provide feedback to end systemssingle bit indicating
congestion (SNA DECbit TCPIP ECN ATM)
explicit rate for sender to send at
Transport Layer
3-31Case study ATM ABR congestion
controlABR available bit rateldquoelastic servicerdquo if senderrsquos path
ldquounderloadedrdquo sender should use
available bandwidth
if senderrsquos path congested sender throttled to
minimum guaranteed rate
RM (resource management) cellssent by sender interspersed
with data cellsbits in RM cell set by switches
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild
congestion) CI bit congestion indication
RM cells returned to sender by receiver with bits intact
Transport Layer
3-32Case study ATM ABR congestion control
two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path
EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in
returned RM cell
RM cell data cell
Transport Layer
3-33
TCP congestion control
Transport Layer
3-34
TCP congestion control additive increase multiplicative decrease approach sender increases transmission
rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1
MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half
after loss
cwnd
TCP
sen
der
cong
estio
n w
indo
w s
ize
AIMD saw toothbehavior probing
for bandwidth
additively increase window size helliphellip until loss occurs (then cut window in half)
time
Transport Layer
3-35TCP Congestion Control details
sender limits transmission
cwnd is dynamic function of perceived network congestion
TCP sending rate
roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte
ACKed sent not-yet ACKed(ldquoin-flightrdquo)
last byte sent
cwnd
LastByteSent-LastByteAcked
lt cwnd
sender sequence number space
rate~~cwndRTT bytessec
Transport Layer
3-36TCP Slow Start
when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for
every ACK received
summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
Transport Layer
3-37
TCP detecting reacting to loss
loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then
grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly
TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)
Transport Layer
3-38
Q when should the exponential increase switch to linear
A when cwnd gets to 12 of its value before timeout
Implementationvariable ssthresh on loss event ssthresh is
set to 12 of cwnd just before loss event
TCP switching from slow start to CA
Transport Layer
3-39
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer
3-40TCP throughput
avg TCP thruput as function of window size RTTignore slow start assume always data to
sendW window size (measured in bytes) where loss
occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW
W2
avg TCP thruput = 34WRTTbytessec
Transport Layer
3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want
10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L
[Mathis 1997]
to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate
new versions of TCP for high-speed
TCP throughput = 122 MSSRTT L
Transport Layer
3-42
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP Fairness
TCP connection 2
Transport Layer
3-43
Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout
increasesmultiplicative decrease decreases throughput
proportionally R
R
equal bandwidth share
Connection 1 throughputCon
nec t
ion
2 t h
roug
hput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer
3-44
Fairness (more)Fairness and UDP
multimedia apps often do not use TCP do not want rate throttled
by congestion control
instead use UDP send audiovideo at
constant rate tolerate packet loss
Fairness parallel TCP connections
application can open multiple parallel connections between two hosts
web browsers do this eg link of rate R with 9
existing connectionsnew app asks for 1 TCP gets
rate R10new app asks for 11 TCPs
gets R2
- Congestion in networks
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (2)
- TCP ACK generation [RFC 1122 RFC 2581]
- TCP fast retransmit
- TCP fast retransmit (2)
- TCP flow control
- TCP flow control (2)
- Connection Management
- Agreeing to establish a connection
- Agreeing to establish a connection (2)
- TCP 3-way handshake
- TCP 3-way handshake FSM
- TCP closing a connection
- TCP closing a connection (2)
- Principles of congestion control
- Principles of congestion control (2)
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 2 (3)
- Causescosts of congestion scenario 2 (4)
- Causescosts of congestion scenario 2 (5)
- Causescosts of congestion scenario 2 (6)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- Case study ATM ABR congestion control
- Case study ATM ABR congestion control (2)
- TCP congestion control
- TCP congestion control additive increase multiplicative decre
- TCP Congestion Control details
- TCP Slow Start
- TCP detecting reacting to loss
- TCP switching from slow start to CA
- Summary TCP Congestion Control
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- TCP Fairness
- Why is TCP fair
- Fairness (more)
-
Transport Layer
3-31Case study ATM ABR congestion
controlABR available bit rateldquoelastic servicerdquo if senderrsquos path
ldquounderloadedrdquo sender should use
available bandwidth
if senderrsquos path congested sender throttled to
minimum guaranteed rate
RM (resource management) cellssent by sender interspersed
with data cellsbits in RM cell set by switches
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild
congestion) CI bit congestion indication
RM cells returned to sender by receiver with bits intact
Transport Layer
3-32Case study ATM ABR congestion control
two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path
EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in
returned RM cell
RM cell data cell
Transport Layer
3-33
TCP congestion control
Transport Layer
3-34
TCP congestion control additive increase multiplicative decrease approach sender increases transmission
rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1
MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half
after loss
cwnd
TCP
sen
der
cong
estio
n w
indo
w s
ize
AIMD saw toothbehavior probing
for bandwidth
additively increase window size helliphellip until loss occurs (then cut window in half)
time
Transport Layer
3-35TCP Congestion Control details
sender limits transmission
cwnd is dynamic function of perceived network congestion
TCP sending rate
roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte
ACKed sent not-yet ACKed(ldquoin-flightrdquo)
last byte sent
cwnd
LastByteSent-LastByteAcked
lt cwnd
sender sequence number space
rate~~cwndRTT bytessec
Transport Layer
3-36TCP Slow Start
when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for
every ACK received
summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
Transport Layer
3-37
TCP detecting reacting to loss
loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then
grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly
TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)
Transport Layer
3-38
Q when should the exponential increase switch to linear
A when cwnd gets to 12 of its value before timeout
Implementationvariable ssthresh on loss event ssthresh is
set to 12 of cwnd just before loss event
TCP switching from slow start to CA
Transport Layer
3-39
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer
3-40TCP throughput
avg TCP thruput as function of window size RTTignore slow start assume always data to
sendW window size (measured in bytes) where loss
occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW
W2
avg TCP thruput = 34WRTTbytessec
Transport Layer
3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want
10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L
[Mathis 1997]
to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate
new versions of TCP for high-speed
TCP throughput = 122 MSSRTT L
Transport Layer
3-42
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP Fairness
TCP connection 2
Transport Layer
3-43
Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout
increasesmultiplicative decrease decreases throughput
proportionally R
R
equal bandwidth share
Connection 1 throughputCon
nec t
ion
2 t h
roug
hput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer
3-44
Fairness (more)Fairness and UDP
multimedia apps often do not use TCP do not want rate throttled
by congestion control
instead use UDP send audiovideo at
constant rate tolerate packet loss
Fairness parallel TCP connections
application can open multiple parallel connections between two hosts
web browsers do this eg link of rate R with 9
existing connectionsnew app asks for 1 TCP gets
rate R10new app asks for 11 TCPs
gets R2
- Congestion in networks
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (2)
- TCP ACK generation [RFC 1122 RFC 2581]
- TCP fast retransmit
- TCP fast retransmit (2)
- TCP flow control
- TCP flow control (2)
- Connection Management
- Agreeing to establish a connection
- Agreeing to establish a connection (2)
- TCP 3-way handshake
- TCP 3-way handshake FSM
- TCP closing a connection
- TCP closing a connection (2)
- Principles of congestion control
- Principles of congestion control (2)
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 2 (3)
- Causescosts of congestion scenario 2 (4)
- Causescosts of congestion scenario 2 (5)
- Causescosts of congestion scenario 2 (6)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- Case study ATM ABR congestion control
- Case study ATM ABR congestion control (2)
- TCP congestion control
- TCP congestion control additive increase multiplicative decre
- TCP Congestion Control details
- TCP Slow Start
- TCP detecting reacting to loss
- TCP switching from slow start to CA
- Summary TCP Congestion Control
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- TCP Fairness
- Why is TCP fair
- Fairness (more)
-
Transport Layer
3-32Case study ATM ABR congestion control
two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path
EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in
returned RM cell
RM cell data cell
Transport Layer
3-33
TCP congestion control
Transport Layer
3-34
TCP congestion control additive increase multiplicative decrease approach sender increases transmission
rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1
MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half
after loss
cwnd
TCP
sen
der
cong
estio
n w
indo
w s
ize
AIMD saw toothbehavior probing
for bandwidth
additively increase window size helliphellip until loss occurs (then cut window in half)
time
Transport Layer
3-35TCP Congestion Control details
sender limits transmission
cwnd is dynamic function of perceived network congestion
TCP sending rate
roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte
ACKed sent not-yet ACKed(ldquoin-flightrdquo)
last byte sent
cwnd
LastByteSent-LastByteAcked
lt cwnd
sender sequence number space
rate~~cwndRTT bytessec
Transport Layer
3-36TCP Slow Start
when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for
every ACK received
summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
Transport Layer
3-37
TCP detecting reacting to loss
loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then
grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly
TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)
Transport Layer
3-38
Q when should the exponential increase switch to linear
A when cwnd gets to 12 of its value before timeout
Implementationvariable ssthresh on loss event ssthresh is
set to 12 of cwnd just before loss event
TCP switching from slow start to CA
Transport Layer
3-39
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer
3-40TCP throughput
avg TCP thruput as function of window size RTTignore slow start assume always data to
sendW window size (measured in bytes) where loss
occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW
W2
avg TCP thruput = 34WRTTbytessec
Transport Layer
3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want
10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L
[Mathis 1997]
to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate
new versions of TCP for high-speed
TCP throughput = 122 MSSRTT L
Transport Layer
3-42
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP Fairness
TCP connection 2
Transport Layer
3-43
Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout
increasesmultiplicative decrease decreases throughput
proportionally R
R
equal bandwidth share
Connection 1 throughputCon
nec t
ion
2 t h
roug
hput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer
3-44
Fairness (more)Fairness and UDP
multimedia apps often do not use TCP do not want rate throttled
by congestion control
instead use UDP send audiovideo at
constant rate tolerate packet loss
Fairness parallel TCP connections
application can open multiple parallel connections between two hosts
web browsers do this eg link of rate R with 9
existing connectionsnew app asks for 1 TCP gets
rate R10new app asks for 11 TCPs
gets R2
- Congestion in networks
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (2)
- TCP ACK generation [RFC 1122 RFC 2581]
- TCP fast retransmit
- TCP fast retransmit (2)
- TCP flow control
- TCP flow control (2)
- Connection Management
- Agreeing to establish a connection
- Agreeing to establish a connection (2)
- TCP 3-way handshake
- TCP 3-way handshake FSM
- TCP closing a connection
- TCP closing a connection (2)
- Principles of congestion control
- Principles of congestion control (2)
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 2 (3)
- Causescosts of congestion scenario 2 (4)
- Causescosts of congestion scenario 2 (5)
- Causescosts of congestion scenario 2 (6)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- Case study ATM ABR congestion control
- Case study ATM ABR congestion control (2)
- TCP congestion control
- TCP congestion control additive increase multiplicative decre
- TCP Congestion Control details
- TCP Slow Start
- TCP detecting reacting to loss
- TCP switching from slow start to CA
- Summary TCP Congestion Control
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- TCP Fairness
- Why is TCP fair
- Fairness (more)
-
Transport Layer
3-33
TCP congestion control
Transport Layer
3-34
TCP congestion control additive increase multiplicative decrease approach sender increases transmission
rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1
MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half
after loss
cwnd
TCP
sen
der
cong
estio
n w
indo
w s
ize
AIMD saw toothbehavior probing
for bandwidth
additively increase window size helliphellip until loss occurs (then cut window in half)
time
Transport Layer
3-35TCP Congestion Control details
sender limits transmission
cwnd is dynamic function of perceived network congestion
TCP sending rate
roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte
ACKed sent not-yet ACKed(ldquoin-flightrdquo)
last byte sent
cwnd
LastByteSent-LastByteAcked
lt cwnd
sender sequence number space
rate~~cwndRTT bytessec
Transport Layer
3-36TCP Slow Start
when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for
every ACK received
summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
Transport Layer
3-37
TCP detecting reacting to loss
loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then
grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly
TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)
Transport Layer
3-38
Q when should the exponential increase switch to linear
A when cwnd gets to 12 of its value before timeout
Implementationvariable ssthresh on loss event ssthresh is
set to 12 of cwnd just before loss event
TCP switching from slow start to CA
Transport Layer
3-39
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer
3-40TCP throughput
avg TCP thruput as function of window size RTTignore slow start assume always data to
sendW window size (measured in bytes) where loss
occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW
W2
avg TCP thruput = 34WRTTbytessec
Transport Layer
3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want
10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L
[Mathis 1997]
to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate
new versions of TCP for high-speed
TCP throughput = 122 MSSRTT L
Transport Layer
3-42
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP Fairness
TCP connection 2
Transport Layer
3-43
Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout
increasesmultiplicative decrease decreases throughput
proportionally R
R
equal bandwidth share
Connection 1 throughputCon
nec t
ion
2 t h
roug
hput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer
3-44
Fairness (more)Fairness and UDP
multimedia apps often do not use TCP do not want rate throttled
by congestion control
instead use UDP send audiovideo at
constant rate tolerate packet loss
Fairness parallel TCP connections
application can open multiple parallel connections between two hosts
web browsers do this eg link of rate R with 9
existing connectionsnew app asks for 1 TCP gets
rate R10new app asks for 11 TCPs
gets R2
- Congestion in networks
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (2)
- TCP ACK generation [RFC 1122 RFC 2581]
- TCP fast retransmit
- TCP fast retransmit (2)
- TCP flow control
- TCP flow control (2)
- Connection Management
- Agreeing to establish a connection
- Agreeing to establish a connection (2)
- TCP 3-way handshake
- TCP 3-way handshake FSM
- TCP closing a connection
- TCP closing a connection (2)
- Principles of congestion control
- Principles of congestion control (2)
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 2 (3)
- Causescosts of congestion scenario 2 (4)
- Causescosts of congestion scenario 2 (5)
- Causescosts of congestion scenario 2 (6)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- Case study ATM ABR congestion control
- Case study ATM ABR congestion control (2)
- TCP congestion control
- TCP congestion control additive increase multiplicative decre
- TCP Congestion Control details
- TCP Slow Start
- TCP detecting reacting to loss
- TCP switching from slow start to CA
- Summary TCP Congestion Control
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- TCP Fairness
- Why is TCP fair
- Fairness (more)
-
Transport Layer
3-34
TCP congestion control additive increase multiplicative decrease approach sender increases transmission
rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1
MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half
after loss
cwnd
TCP
sen
der
cong
estio
n w
indo
w s
ize
AIMD saw toothbehavior probing
for bandwidth
additively increase window size helliphellip until loss occurs (then cut window in half)
time
Transport Layer
3-35TCP Congestion Control details
sender limits transmission
cwnd is dynamic function of perceived network congestion
TCP sending rate
roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte
ACKed sent not-yet ACKed(ldquoin-flightrdquo)
last byte sent
cwnd
LastByteSent-LastByteAcked
lt cwnd
sender sequence number space
rate~~cwndRTT bytessec
Transport Layer
3-36TCP Slow Start
when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for
every ACK received
summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
Transport Layer
3-37
TCP detecting reacting to loss
loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then
grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly
TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)
Transport Layer
3-38
Q when should the exponential increase switch to linear
A when cwnd gets to 12 of its value before timeout
Implementationvariable ssthresh on loss event ssthresh is
set to 12 of cwnd just before loss event
TCP switching from slow start to CA
Transport Layer
3-39
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer
3-40TCP throughput
avg TCP thruput as function of window size RTTignore slow start assume always data to
sendW window size (measured in bytes) where loss
occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW
W2
avg TCP thruput = 34WRTTbytessec
Transport Layer
3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want
10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L
[Mathis 1997]
to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate
new versions of TCP for high-speed
TCP throughput = 122 MSSRTT L
Transport Layer
3-42
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP Fairness
TCP connection 2
Transport Layer
3-43
Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout
increasesmultiplicative decrease decreases throughput
proportionally R
R
equal bandwidth share
Connection 1 throughputCon
nec t
ion
2 t h
roug
hput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer
3-44
Fairness (more)Fairness and UDP
multimedia apps often do not use TCP do not want rate throttled
by congestion control
instead use UDP send audiovideo at
constant rate tolerate packet loss
Fairness parallel TCP connections
application can open multiple parallel connections between two hosts
web browsers do this eg link of rate R with 9
existing connectionsnew app asks for 1 TCP gets
rate R10new app asks for 11 TCPs
gets R2
- Congestion in networks
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (2)
- TCP ACK generation [RFC 1122 RFC 2581]
- TCP fast retransmit
- TCP fast retransmit (2)
- TCP flow control
- TCP flow control (2)
- Connection Management
- Agreeing to establish a connection
- Agreeing to establish a connection (2)
- TCP 3-way handshake
- TCP 3-way handshake FSM
- TCP closing a connection
- TCP closing a connection (2)
- Principles of congestion control
- Principles of congestion control (2)
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 2 (3)
- Causescosts of congestion scenario 2 (4)
- Causescosts of congestion scenario 2 (5)
- Causescosts of congestion scenario 2 (6)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- Case study ATM ABR congestion control
- Case study ATM ABR congestion control (2)
- TCP congestion control
- TCP congestion control additive increase multiplicative decre
- TCP Congestion Control details
- TCP Slow Start
- TCP detecting reacting to loss
- TCP switching from slow start to CA
- Summary TCP Congestion Control
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- TCP Fairness
- Why is TCP fair
- Fairness (more)
-
Transport Layer
3-35TCP Congestion Control details
sender limits transmission
cwnd is dynamic function of perceived network congestion
TCP sending rate
roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte
ACKed sent not-yet ACKed(ldquoin-flightrdquo)
last byte sent
cwnd
LastByteSent-LastByteAcked
lt cwnd
sender sequence number space
rate~~cwndRTT bytessec
Transport Layer
3-36TCP Slow Start
when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for
every ACK received
summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
Transport Layer
3-37
TCP detecting reacting to loss
loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then
grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly
TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)
Transport Layer
3-38
Q when should the exponential increase switch to linear
A when cwnd gets to 12 of its value before timeout
Implementationvariable ssthresh on loss event ssthresh is
set to 12 of cwnd just before loss event
TCP switching from slow start to CA
Transport Layer
3-39
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer
3-40TCP throughput
avg TCP thruput as function of window size RTTignore slow start assume always data to
sendW window size (measured in bytes) where loss
occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW
W2
avg TCP thruput = 34WRTTbytessec
Transport Layer
3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want
10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L
[Mathis 1997]
to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate
new versions of TCP for high-speed
TCP throughput = 122 MSSRTT L
Transport Layer
3-42
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP Fairness
TCP connection 2
Transport Layer
3-43
Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout
increasesmultiplicative decrease decreases throughput
proportionally R
R
equal bandwidth share
Connection 1 throughputCon
nec t
ion
2 t h
roug
hput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer
3-44
Fairness (more)Fairness and UDP
multimedia apps often do not use TCP do not want rate throttled
by congestion control
instead use UDP send audiovideo at
constant rate tolerate packet loss
Fairness parallel TCP connections
application can open multiple parallel connections between two hosts
web browsers do this eg link of rate R with 9
existing connectionsnew app asks for 1 TCP gets
rate R10new app asks for 11 TCPs
gets R2
- Congestion in networks
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (2)
- TCP ACK generation [RFC 1122 RFC 2581]
- TCP fast retransmit
- TCP fast retransmit (2)
- TCP flow control
- TCP flow control (2)
- Connection Management
- Agreeing to establish a connection
- Agreeing to establish a connection (2)
- TCP 3-way handshake
- TCP 3-way handshake FSM
- TCP closing a connection
- TCP closing a connection (2)
- Principles of congestion control
- Principles of congestion control (2)
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 2 (3)
- Causescosts of congestion scenario 2 (4)
- Causescosts of congestion scenario 2 (5)
- Causescosts of congestion scenario 2 (6)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- Case study ATM ABR congestion control
- Case study ATM ABR congestion control (2)
- TCP congestion control
- TCP congestion control additive increase multiplicative decre
- TCP Congestion Control details
- TCP Slow Start
- TCP detecting reacting to loss
- TCP switching from slow start to CA
- Summary TCP Congestion Control
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- TCP Fairness
- Why is TCP fair
- Fairness (more)
-
Transport Layer
3-36TCP Slow Start
when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for
every ACK received
summary initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
Transport Layer
3-37
TCP detecting reacting to loss
loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then
grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly
TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)
Transport Layer
3-38
Q when should the exponential increase switch to linear
A when cwnd gets to 12 of its value before timeout
Implementationvariable ssthresh on loss event ssthresh is
set to 12 of cwnd just before loss event
TCP switching from slow start to CA
Transport Layer
3-39
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer
3-40TCP throughput
avg TCP thruput as function of window size RTTignore slow start assume always data to
sendW window size (measured in bytes) where loss
occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW
W2
avg TCP thruput = 34WRTTbytessec
Transport Layer
3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want
10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L
[Mathis 1997]
to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate
new versions of TCP for high-speed
TCP throughput = 122 MSSRTT L
Transport Layer
3-42
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP Fairness
TCP connection 2
Transport Layer
3-43
Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout
increasesmultiplicative decrease decreases throughput
proportionally R
R
equal bandwidth share
Connection 1 throughputCon
nec t
ion
2 t h
roug
hput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer
3-44
Fairness (more)Fairness and UDP
multimedia apps often do not use TCP do not want rate throttled
by congestion control
instead use UDP send audiovideo at
constant rate tolerate packet loss
Fairness parallel TCP connections
application can open multiple parallel connections between two hosts
web browsers do this eg link of rate R with 9
existing connectionsnew app asks for 1 TCP gets
rate R10new app asks for 11 TCPs
gets R2
- Congestion in networks
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (2)
- TCP ACK generation [RFC 1122 RFC 2581]
- TCP fast retransmit
- TCP fast retransmit (2)
- TCP flow control
- TCP flow control (2)
- Connection Management
- Agreeing to establish a connection
- Agreeing to establish a connection (2)
- TCP 3-way handshake
- TCP 3-way handshake FSM
- TCP closing a connection
- TCP closing a connection (2)
- Principles of congestion control
- Principles of congestion control (2)
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 2 (3)
- Causescosts of congestion scenario 2 (4)
- Causescosts of congestion scenario 2 (5)
- Causescosts of congestion scenario 2 (6)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- Case study ATM ABR congestion control
- Case study ATM ABR congestion control (2)
- TCP congestion control
- TCP congestion control additive increase multiplicative decre
- TCP Congestion Control details
- TCP Slow Start
- TCP detecting reacting to loss
- TCP switching from slow start to CA
- Summary TCP Congestion Control
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- TCP Fairness
- Why is TCP fair
- Fairness (more)
-
Transport Layer
3-37
TCP detecting reacting to loss
loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then
grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly
TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)
Transport Layer
3-38
Q when should the exponential increase switch to linear
A when cwnd gets to 12 of its value before timeout
Implementationvariable ssthresh on loss event ssthresh is
set to 12 of cwnd just before loss event
TCP switching from slow start to CA
Transport Layer
3-39
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer
3-40TCP throughput
avg TCP thruput as function of window size RTTignore slow start assume always data to
sendW window size (measured in bytes) where loss
occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW
W2
avg TCP thruput = 34WRTTbytessec
Transport Layer
3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want
10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L
[Mathis 1997]
to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate
new versions of TCP for high-speed
TCP throughput = 122 MSSRTT L
Transport Layer
3-42
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP Fairness
TCP connection 2
Transport Layer
3-43
Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout
increasesmultiplicative decrease decreases throughput
proportionally R
R
equal bandwidth share
Connection 1 throughputCon
nec t
ion
2 t h
roug
hput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer
3-44
Fairness (more)Fairness and UDP
multimedia apps often do not use TCP do not want rate throttled
by congestion control
instead use UDP send audiovideo at
constant rate tolerate packet loss
Fairness parallel TCP connections
application can open multiple parallel connections between two hosts
web browsers do this eg link of rate R with 9
existing connectionsnew app asks for 1 TCP gets
rate R10new app asks for 11 TCPs
gets R2
- Congestion in networks
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (2)
- TCP ACK generation [RFC 1122 RFC 2581]
- TCP fast retransmit
- TCP fast retransmit (2)
- TCP flow control
- TCP flow control (2)
- Connection Management
- Agreeing to establish a connection
- Agreeing to establish a connection (2)
- TCP 3-way handshake
- TCP 3-way handshake FSM
- TCP closing a connection
- TCP closing a connection (2)
- Principles of congestion control
- Principles of congestion control (2)
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 2 (3)
- Causescosts of congestion scenario 2 (4)
- Causescosts of congestion scenario 2 (5)
- Causescosts of congestion scenario 2 (6)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- Case study ATM ABR congestion control
- Case study ATM ABR congestion control (2)
- TCP congestion control
- TCP congestion control additive increase multiplicative decre
- TCP Congestion Control details
- TCP Slow Start
- TCP detecting reacting to loss
- TCP switching from slow start to CA
- Summary TCP Congestion Control
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- TCP Fairness
- Why is TCP fair
- Fairness (more)
-
Transport Layer
3-38
Q when should the exponential increase switch to linear
A when cwnd gets to 12 of its value before timeout
Implementationvariable ssthresh on loss event ssthresh is
set to 12 of cwnd just before loss event
TCP switching from slow start to CA
Transport Layer
3-39
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer
3-40TCP throughput
avg TCP thruput as function of window size RTTignore slow start assume always data to
sendW window size (measured in bytes) where loss
occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW
W2
avg TCP thruput = 34WRTTbytessec
Transport Layer
3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want
10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L
[Mathis 1997]
to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate
new versions of TCP for high-speed
TCP throughput = 122 MSSRTT L
Transport Layer
3-42
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP Fairness
TCP connection 2
Transport Layer
3-43
Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout
increasesmultiplicative decrease decreases throughput
proportionally R
R
equal bandwidth share
Connection 1 throughputCon
nec t
ion
2 t h
roug
hput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer
3-44
Fairness (more)Fairness and UDP
multimedia apps often do not use TCP do not want rate throttled
by congestion control
instead use UDP send audiovideo at
constant rate tolerate packet loss
Fairness parallel TCP connections
application can open multiple parallel connections between two hosts
web browsers do this eg link of rate R with 9
existing connectionsnew app asks for 1 TCP gets
rate R10new app asks for 11 TCPs
gets R2
- Congestion in networks
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (2)
- TCP ACK generation [RFC 1122 RFC 2581]
- TCP fast retransmit
- TCP fast retransmit (2)
- TCP flow control
- TCP flow control (2)
- Connection Management
- Agreeing to establish a connection
- Agreeing to establish a connection (2)
- TCP 3-way handshake
- TCP 3-way handshake FSM
- TCP closing a connection
- TCP closing a connection (2)
- Principles of congestion control
- Principles of congestion control (2)
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 2 (3)
- Causescosts of congestion scenario 2 (4)
- Causescosts of congestion scenario 2 (5)
- Causescosts of congestion scenario 2 (6)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- Case study ATM ABR congestion control
- Case study ATM ABR congestion control (2)
- TCP congestion control
- TCP congestion control additive increase multiplicative decre
- TCP Congestion Control details
- TCP Slow Start
- TCP detecting reacting to loss
- TCP switching from slow start to CA
- Summary TCP Congestion Control
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- TCP Fairness
- Why is TCP fair
- Fairness (more)
-
Transport Layer
3-39
Summary TCP Congestion Control
timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Lcwnd gt ssthresh
congestionavoidance
cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed
new ACK
dupACKcount++duplicate ACK
fastrecovery
cwnd = cwnd + MSStransmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment
ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
slow start
timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
new ACKdupACKcount++duplicate ACK
Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
NewACK
NewACK
NewACK
Transport Layer
3-40TCP throughput
avg TCP thruput as function of window size RTTignore slow start assume always data to
sendW window size (measured in bytes) where loss
occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW
W2
avg TCP thruput = 34WRTTbytessec
Transport Layer
3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want
10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L
[Mathis 1997]
to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate
new versions of TCP for high-speed
TCP throughput = 122 MSSRTT L
Transport Layer
3-42
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP Fairness
TCP connection 2
Transport Layer
3-43
Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout
increasesmultiplicative decrease decreases throughput
proportionally R
R
equal bandwidth share
Connection 1 throughputCon
nec t
ion
2 t h
roug
hput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer
3-44
Fairness (more)Fairness and UDP
multimedia apps often do not use TCP do not want rate throttled
by congestion control
instead use UDP send audiovideo at
constant rate tolerate packet loss
Fairness parallel TCP connections
application can open multiple parallel connections between two hosts
web browsers do this eg link of rate R with 9
existing connectionsnew app asks for 1 TCP gets
rate R10new app asks for 11 TCPs
gets R2
- Congestion in networks
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (2)
- TCP ACK generation [RFC 1122 RFC 2581]
- TCP fast retransmit
- TCP fast retransmit (2)
- TCP flow control
- TCP flow control (2)
- Connection Management
- Agreeing to establish a connection
- Agreeing to establish a connection (2)
- TCP 3-way handshake
- TCP 3-way handshake FSM
- TCP closing a connection
- TCP closing a connection (2)
- Principles of congestion control
- Principles of congestion control (2)
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 2 (3)
- Causescosts of congestion scenario 2 (4)
- Causescosts of congestion scenario 2 (5)
- Causescosts of congestion scenario 2 (6)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- Case study ATM ABR congestion control
- Case study ATM ABR congestion control (2)
- TCP congestion control
- TCP congestion control additive increase multiplicative decre
- TCP Congestion Control details
- TCP Slow Start
- TCP detecting reacting to loss
- TCP switching from slow start to CA
- Summary TCP Congestion Control
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- TCP Fairness
- Why is TCP fair
- Fairness (more)
-
Transport Layer
3-40TCP throughput
avg TCP thruput as function of window size RTTignore slow start assume always data to
sendW window size (measured in bytes) where loss
occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW
W2
avg TCP thruput = 34WRTTbytessec
Transport Layer
3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want
10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L
[Mathis 1997]
to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate
new versions of TCP for high-speed
TCP throughput = 122 MSSRTT L
Transport Layer
3-42
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP Fairness
TCP connection 2
Transport Layer
3-43
Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout
increasesmultiplicative decrease decreases throughput
proportionally R
R
equal bandwidth share
Connection 1 throughputCon
nec t
ion
2 t h
roug
hput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer
3-44
Fairness (more)Fairness and UDP
multimedia apps often do not use TCP do not want rate throttled
by congestion control
instead use UDP send audiovideo at
constant rate tolerate packet loss
Fairness parallel TCP connections
application can open multiple parallel connections between two hosts
web browsers do this eg link of rate R with 9
existing connectionsnew app asks for 1 TCP gets
rate R10new app asks for 11 TCPs
gets R2
- Congestion in networks
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (2)
- TCP ACK generation [RFC 1122 RFC 2581]
- TCP fast retransmit
- TCP fast retransmit (2)
- TCP flow control
- TCP flow control (2)
- Connection Management
- Agreeing to establish a connection
- Agreeing to establish a connection (2)
- TCP 3-way handshake
- TCP 3-way handshake FSM
- TCP closing a connection
- TCP closing a connection (2)
- Principles of congestion control
- Principles of congestion control (2)
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 2 (3)
- Causescosts of congestion scenario 2 (4)
- Causescosts of congestion scenario 2 (5)
- Causescosts of congestion scenario 2 (6)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- Case study ATM ABR congestion control
- Case study ATM ABR congestion control (2)
- TCP congestion control
- TCP congestion control additive increase multiplicative decre
- TCP Congestion Control details
- TCP Slow Start
- TCP detecting reacting to loss
- TCP switching from slow start to CA
- Summary TCP Congestion Control
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- TCP Fairness
- Why is TCP fair
- Fairness (more)
-
Transport Layer
3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want
10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L
[Mathis 1997]
to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate
new versions of TCP for high-speed
TCP throughput = 122 MSSRTT L
Transport Layer
3-42
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP Fairness
TCP connection 2
Transport Layer
3-43
Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout
increasesmultiplicative decrease decreases throughput
proportionally R
R
equal bandwidth share
Connection 1 throughputCon
nec t
ion
2 t h
roug
hput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer
3-44
Fairness (more)Fairness and UDP
multimedia apps often do not use TCP do not want rate throttled
by congestion control
instead use UDP send audiovideo at
constant rate tolerate packet loss
Fairness parallel TCP connections
application can open multiple parallel connections between two hosts
web browsers do this eg link of rate R with 9
existing connectionsnew app asks for 1 TCP gets
rate R10new app asks for 11 TCPs
gets R2
- Congestion in networks
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (2)
- TCP ACK generation [RFC 1122 RFC 2581]
- TCP fast retransmit
- TCP fast retransmit (2)
- TCP flow control
- TCP flow control (2)
- Connection Management
- Agreeing to establish a connection
- Agreeing to establish a connection (2)
- TCP 3-way handshake
- TCP 3-way handshake FSM
- TCP closing a connection
- TCP closing a connection (2)
- Principles of congestion control
- Principles of congestion control (2)
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 2 (3)
- Causescosts of congestion scenario 2 (4)
- Causescosts of congestion scenario 2 (5)
- Causescosts of congestion scenario 2 (6)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- Case study ATM ABR congestion control
- Case study ATM ABR congestion control (2)
- TCP congestion control
- TCP congestion control additive increase multiplicative decre
- TCP Congestion Control details
- TCP Slow Start
- TCP detecting reacting to loss
- TCP switching from slow start to CA
- Summary TCP Congestion Control
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- TCP Fairness
- Why is TCP fair
- Fairness (more)
-
Transport Layer
3-42
fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckroutercapacity R
TCP Fairness
TCP connection 2
Transport Layer
3-43
Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout
increasesmultiplicative decrease decreases throughput
proportionally R
R
equal bandwidth share
Connection 1 throughputCon
nec t
ion
2 t h
roug
hput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer
3-44
Fairness (more)Fairness and UDP
multimedia apps often do not use TCP do not want rate throttled
by congestion control
instead use UDP send audiovideo at
constant rate tolerate packet loss
Fairness parallel TCP connections
application can open multiple parallel connections between two hosts
web browsers do this eg link of rate R with 9
existing connectionsnew app asks for 1 TCP gets
rate R10new app asks for 11 TCPs
gets R2
- Congestion in networks
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (2)
- TCP ACK generation [RFC 1122 RFC 2581]
- TCP fast retransmit
- TCP fast retransmit (2)
- TCP flow control
- TCP flow control (2)
- Connection Management
- Agreeing to establish a connection
- Agreeing to establish a connection (2)
- TCP 3-way handshake
- TCP 3-way handshake FSM
- TCP closing a connection
- TCP closing a connection (2)
- Principles of congestion control
- Principles of congestion control (2)
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 2 (3)
- Causescosts of congestion scenario 2 (4)
- Causescosts of congestion scenario 2 (5)
- Causescosts of congestion scenario 2 (6)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- Case study ATM ABR congestion control
- Case study ATM ABR congestion control (2)
- TCP congestion control
- TCP congestion control additive increase multiplicative decre
- TCP Congestion Control details
- TCP Slow Start
- TCP detecting reacting to loss
- TCP switching from slow start to CA
- Summary TCP Congestion Control
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- TCP Fairness
- Why is TCP fair
- Fairness (more)
-
Transport Layer
3-43
Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout
increasesmultiplicative decrease decreases throughput
proportionally R
R
equal bandwidth share
Connection 1 throughputCon
nec t
ion
2 t h
roug
hput
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
Transport Layer
3-44
Fairness (more)Fairness and UDP
multimedia apps often do not use TCP do not want rate throttled
by congestion control
instead use UDP send audiovideo at
constant rate tolerate packet loss
Fairness parallel TCP connections
application can open multiple parallel connections between two hosts
web browsers do this eg link of rate R with 9
existing connectionsnew app asks for 1 TCP gets
rate R10new app asks for 11 TCPs
gets R2
- Congestion in networks
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (2)
- TCP ACK generation [RFC 1122 RFC 2581]
- TCP fast retransmit
- TCP fast retransmit (2)
- TCP flow control
- TCP flow control (2)
- Connection Management
- Agreeing to establish a connection
- Agreeing to establish a connection (2)
- TCP 3-way handshake
- TCP 3-way handshake FSM
- TCP closing a connection
- TCP closing a connection (2)
- Principles of congestion control
- Principles of congestion control (2)
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 2 (3)
- Causescosts of congestion scenario 2 (4)
- Causescosts of congestion scenario 2 (5)
- Causescosts of congestion scenario 2 (6)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- Case study ATM ABR congestion control
- Case study ATM ABR congestion control (2)
- TCP congestion control
- TCP congestion control additive increase multiplicative decre
- TCP Congestion Control details
- TCP Slow Start
- TCP detecting reacting to loss
- TCP switching from slow start to CA
- Summary TCP Congestion Control
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- TCP Fairness
- Why is TCP fair
- Fairness (more)
-
Transport Layer
3-44
Fairness (more)Fairness and UDP
multimedia apps often do not use TCP do not want rate throttled
by congestion control
instead use UDP send audiovideo at
constant rate tolerate packet loss
Fairness parallel TCP connections
application can open multiple parallel connections between two hosts
web browsers do this eg link of rate R with 9
existing connectionsnew app asks for 1 TCP gets
rate R10new app asks for 11 TCPs
gets R2
- Congestion in networks
- TCP reliable data transfer
- TCP sender events
- TCP sender (simplified)
- TCP retransmission scenarios
- TCP retransmission scenarios (2)
- TCP ACK generation [RFC 1122 RFC 2581]
- TCP fast retransmit
- TCP fast retransmit (2)
- TCP flow control
- TCP flow control (2)
- Connection Management
- Agreeing to establish a connection
- Agreeing to establish a connection (2)
- TCP 3-way handshake
- TCP 3-way handshake FSM
- TCP closing a connection
- TCP closing a connection (2)
- Principles of congestion control
- Principles of congestion control (2)
- Causescosts of congestion scenario 1
- Causescosts of congestion scenario 2
- Causescosts of congestion scenario 2 (2)
- Causescosts of congestion scenario 2 (3)
- Causescosts of congestion scenario 2 (4)
- Causescosts of congestion scenario 2 (5)
- Causescosts of congestion scenario 2 (6)
- Causescosts of congestion scenario 3
- Causescosts of congestion scenario 3 (2)
- Approaches towards congestion control
- Case study ATM ABR congestion control
- Case study ATM ABR congestion control (2)
- TCP congestion control
- TCP congestion control additive increase multiplicative decre
- TCP Congestion Control details
- TCP Slow Start
- TCP detecting reacting to loss
- TCP switching from slow start to CA
- Summary TCP Congestion Control
- TCP throughput
- TCP Futures TCP over ldquolong fat pipesrdquo
- TCP Fairness
- Why is TCP fair
- Fairness (more)
-