Congestion in networks

44
Congestion in networks

description

Congestion in networks. TCP creates rdt service on top of IP ’ s unreliable service pipelined segments cumulative acks single retransmission timer retransmissions triggered by: timeout events duplicate acks. let ’ s initially consider simplified TCP sender: ignore duplicate acks - PowerPoint PPT Presentation

Transcript of Congestion in networks

Page 1: Congestion in networks

Congestion in networks

Transport Layer

3-2

TCP reliable data transferTCP creates rdt service on

top of IPrsquos unreliable service pipelined segments cumulative acks single retransmission timer

retransmissions triggered by timeout events duplicate acks

letrsquos initially consider simplified TCP sender ignore duplicate acks ignore flow control

congestion control

Transport Layer

3-3

TCP sender eventsdata rcvd from app

create segment with seq

seq is byte-stream number of first data byte in segment

start timer if not already running think of timer as for oldest

unacked segment expiration interval TimeOutInterval

timeout

retransmit segment that caused timeout

restart timer ack rcvd

if ack acknowledges previously unacked segments update what is known to

be ACKed start timer if there are still

unacked segments

Transport Layer

3-4TCP sender (simplified)

waitfor event

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

L

create segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running) start timer

data received from application above

retransmit not-yet-acked segment with smallest seq start timer

timeout

if (y gt SendBase) SendBase = y SendBasendash1 last cumulatively ACKed byte if (there are currently not-yet-acked segments) start timer else stop timer

ACK received with ACK field value y

Transport Layer

3-5TCP retransmission scenarios

lost ACK scenario

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8 bytes of data

Xtimeo

ut

ACK=100

premature timeout

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8bytes of data

timeo

utACK=120

Seq=100 20 bytes of data

ACK=120

SendBase=100SendBase=120

SendBase=120

SendBase=92

Transport Layer

3-6TCP retransmission scenarios

X

cumulative ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=120 15 bytes of data

timeo

ut

Seq=100 20 bytes of data

ACK=120

Transport Layer

3-7TCP ACK generation [RFC 1122 RFC 2581]

event at receiver

arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

arrival of in-order segment withexpected seq One other segment has ACK pending

arrival of out-of-order segmenthigher-than-expect seq Gap detected

arrival of segment that partially or completely fills gap

TCP receiver action

delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

immediately send single cumulative ACK ACKing both in-order segments

immediately send duplicate ACK indicating seq of next expected byte

immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer

3-8TCP fast retransmittime-out period often

relatively long long delay before

resending lost packet

detect lost segments via duplicate ACKs sender often sends many

segments back-to-back if segment is lost there will

likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unacked segment with smallest seq likely that unacked

segment lost so donrsquot wait for timeout

TCP fast retransmit

(ldquotriple duplicate ACKsrdquo)

Transport Layer

3-9

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

timeo

ut

ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer

3-10

TCP flow controlapplicationprocess

TCP socketreceiver buffers

TCPcode

IPcode

applicationOS

receiver protocol stack

application may remove data from

TCP socket buffers hellip

hellip slower than TCP receiver is delivering(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer

3-11

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application processreceiver ldquoadvertisesrdquo free

buffer space by including rwnd value in TCP header of receiver-to-sender segmentsRcvBuffer size set via socket

options (typical default is 4096 bytes)

many operating systems autoadjust RcvBuffer

sender limits amount of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value

guarantees receive buffer will not overflow

receiver-side buffering

Transport Layer

3-12Connection Managementbefore exchanging data senderreceiver

ldquohandshakerdquoagree to establish connection (each knowing the

other willing to establish connection)agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer

3-13

Q will 2-way handshake always work in network

variable delaysretransmitted messages (eg

req_conn(x)) due to message lossmessage reorderingcanrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talkOK ESTAB

ESTAB

choose x req_conn(x)ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer

3-14

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

esserverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose x req_conn(x)ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose x req_conn(x)ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer

3-15TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data received ACK(y)

indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTENserver state

LISTEN

Transport Layer

3-16

TCP 3-way handshake FSM

closed

L

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)SYNACK(seq=yACKnum=x+1)create new socket for communication back to client

SYNACK(seq=yACKnum=x+1)ACK(ACKnum=y+1)ACK(ACKnum=y+1)

L

Transport Layer

3-17TCP closing a connection

client server each close their side of connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN

simultaneous FIN exchanges can be handled

Transport Layer

3-18

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

closecan stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

Transport Layer

3-19Principles of congestion control

Transport Layer

3-20

congestion informally ldquotoo many sources sending too much data too

fast for network to handlerdquodifferent from flow controlmanifestationslost packets (buffer overflow at routers)long delays (queueing in router

buffers)a top-10 problem

Principles of congestion control

Transport Layer

3-21

Causescosts of congestion scenario 1 two senders two receiversone router infinite buffers output link capacity Rno retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data lin

Host B

throughput lout

R2

R2

l out

lin R2de

lay

lin large delays as arrival

rate lin approaches capacity

Transport Layer

3-22

one router finite buffers sender retransmission of timed-out packet application-layer input = application-layer output lin = lout

transport-layer input includes retransmissions lin lin

finite shared output link buffers

Host A

lin original data

Host B

loutlin original data plus retransmitted data

Causescosts of congestion scenario 2

Transport Layer

3-23

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

lin original dataloutlin original data plus

retransmitted data

copy

free buffer space

R2

R2

l out

lin

Causescosts of congestion scenario 2

Host B

A

Transport Layer

3-24

lin original dataloutlin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer

3-25

lin original dataloutlin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2lin

l out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer

3-26

A

lin loutlincopy

free buffer space

timeout

R2

R2lin

l out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer

3-27

R2

l out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2lin

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer

3-28

four sendersmultihop paths timeoutretransmit

Q what happens as lin and lin

rsquo increase

finite shared output link buffers

Host A lout

Causescosts of congestion scenario 3

Host B

Host CHost D

lin original data

lin original data plus retransmitted data

A as red linrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput g 0

Transport Layer

3-29

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

l out

linrsquo

Transport Layer

3-30

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end congestion

controlno explicit

feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer

3-31Case study ATM ABR congestion

controlABR available bit rateldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should use

available bandwidth

if senderrsquos path congested sender throttled to

minimum guaranteed rate

RM (resource management) cellssent by sender interspersed

with data cellsbits in RM cell set by switches

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild

congestion) CI bit congestion indication

RM cells returned to sender by receiver with bits intact

Transport Layer

3-32Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in

returned RM cell

RM cell data cell

Transport Layer

3-33

TCP congestion control

Transport Layer

3-34

TCP congestion control additive increase multiplicative decrease approach sender increases transmission

rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half

after loss

cwnd

TCP

sen

der

cong

estio

n w

indo

w s

ize

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer

3-35TCP Congestion Control details

sender limits transmission

cwnd is dynamic function of perceived network congestion

TCP sending rate

roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte

ACKed sent not-yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent-LastByteAcked

lt cwnd

sender sequence number space

rate~~cwndRTT bytessec

Transport Layer

3-36TCP Slow Start

when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for

every ACK received

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer

3-37

TCP detecting reacting to loss

loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer

3-38

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementationvariable ssthresh on loss event ssthresh is

set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer

3-39

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer

3-40TCP throughput

avg TCP thruput as function of window size RTTignore slow start assume always data to

sendW window size (measured in bytes) where loss

occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW

W2

avg TCP thruput = 34WRTTbytessec

Transport Layer

3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want

10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L

[Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer

3-42

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP Fairness

TCP connection 2

Transport Layer

3-43

Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout

increasesmultiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughputCon

nec t

ion

2 t h

roug

hput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer

3-44

Fairness (more)Fairness and UDP

multimedia apps often do not use TCP do not want rate throttled

by congestion control

instead use UDP send audiovideo at

constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connectionsnew app asks for 1 TCP gets

rate R10new app asks for 11 TCPs

gets R2

  • Congestion in networks
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (2)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • TCP fast retransmit (2)
  • TCP flow control
  • TCP flow control (2)
  • Connection Management
  • Agreeing to establish a connection
  • Agreeing to establish a connection (2)
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • TCP closing a connection (2)
  • Principles of congestion control
  • Principles of congestion control (2)
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 2 (3)
  • Causescosts of congestion scenario 2 (4)
  • Causescosts of congestion scenario 2 (5)
  • Causescosts of congestion scenario 2 (6)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Case study ATM ABR congestion control (2)
  • TCP congestion control
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 2: Congestion in networks

Transport Layer

3-2

TCP reliable data transferTCP creates rdt service on

top of IPrsquos unreliable service pipelined segments cumulative acks single retransmission timer

retransmissions triggered by timeout events duplicate acks

letrsquos initially consider simplified TCP sender ignore duplicate acks ignore flow control

congestion control

Transport Layer

3-3

TCP sender eventsdata rcvd from app

create segment with seq

seq is byte-stream number of first data byte in segment

start timer if not already running think of timer as for oldest

unacked segment expiration interval TimeOutInterval

timeout

retransmit segment that caused timeout

restart timer ack rcvd

if ack acknowledges previously unacked segments update what is known to

be ACKed start timer if there are still

unacked segments

Transport Layer

3-4TCP sender (simplified)

waitfor event

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

L

create segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running) start timer

data received from application above

retransmit not-yet-acked segment with smallest seq start timer

timeout

if (y gt SendBase) SendBase = y SendBasendash1 last cumulatively ACKed byte if (there are currently not-yet-acked segments) start timer else stop timer

ACK received with ACK field value y

Transport Layer

3-5TCP retransmission scenarios

lost ACK scenario

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8 bytes of data

Xtimeo

ut

ACK=100

premature timeout

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8bytes of data

timeo

utACK=120

Seq=100 20 bytes of data

ACK=120

SendBase=100SendBase=120

SendBase=120

SendBase=92

Transport Layer

3-6TCP retransmission scenarios

X

cumulative ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=120 15 bytes of data

timeo

ut

Seq=100 20 bytes of data

ACK=120

Transport Layer

3-7TCP ACK generation [RFC 1122 RFC 2581]

event at receiver

arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

arrival of in-order segment withexpected seq One other segment has ACK pending

arrival of out-of-order segmenthigher-than-expect seq Gap detected

arrival of segment that partially or completely fills gap

TCP receiver action

delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

immediately send single cumulative ACK ACKing both in-order segments

immediately send duplicate ACK indicating seq of next expected byte

immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer

3-8TCP fast retransmittime-out period often

relatively long long delay before

resending lost packet

detect lost segments via duplicate ACKs sender often sends many

segments back-to-back if segment is lost there will

likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unacked segment with smallest seq likely that unacked

segment lost so donrsquot wait for timeout

TCP fast retransmit

(ldquotriple duplicate ACKsrdquo)

Transport Layer

3-9

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

timeo

ut

ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer

3-10

TCP flow controlapplicationprocess

TCP socketreceiver buffers

TCPcode

IPcode

applicationOS

receiver protocol stack

application may remove data from

TCP socket buffers hellip

hellip slower than TCP receiver is delivering(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer

3-11

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application processreceiver ldquoadvertisesrdquo free

buffer space by including rwnd value in TCP header of receiver-to-sender segmentsRcvBuffer size set via socket

options (typical default is 4096 bytes)

many operating systems autoadjust RcvBuffer

sender limits amount of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value

guarantees receive buffer will not overflow

receiver-side buffering

Transport Layer

3-12Connection Managementbefore exchanging data senderreceiver

ldquohandshakerdquoagree to establish connection (each knowing the

other willing to establish connection)agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer

3-13

Q will 2-way handshake always work in network

variable delaysretransmitted messages (eg

req_conn(x)) due to message lossmessage reorderingcanrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talkOK ESTAB

ESTAB

choose x req_conn(x)ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer

3-14

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

esserverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose x req_conn(x)ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose x req_conn(x)ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer

3-15TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data received ACK(y)

indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTENserver state

LISTEN

Transport Layer

3-16

TCP 3-way handshake FSM

closed

L

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)SYNACK(seq=yACKnum=x+1)create new socket for communication back to client

SYNACK(seq=yACKnum=x+1)ACK(ACKnum=y+1)ACK(ACKnum=y+1)

L

Transport Layer

3-17TCP closing a connection

client server each close their side of connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN

simultaneous FIN exchanges can be handled

Transport Layer

3-18

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

closecan stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

Transport Layer

3-19Principles of congestion control

Transport Layer

3-20

congestion informally ldquotoo many sources sending too much data too

fast for network to handlerdquodifferent from flow controlmanifestationslost packets (buffer overflow at routers)long delays (queueing in router

buffers)a top-10 problem

Principles of congestion control

Transport Layer

3-21

Causescosts of congestion scenario 1 two senders two receiversone router infinite buffers output link capacity Rno retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data lin

Host B

throughput lout

R2

R2

l out

lin R2de

lay

lin large delays as arrival

rate lin approaches capacity

Transport Layer

3-22

one router finite buffers sender retransmission of timed-out packet application-layer input = application-layer output lin = lout

transport-layer input includes retransmissions lin lin

finite shared output link buffers

Host A

lin original data

Host B

loutlin original data plus retransmitted data

Causescosts of congestion scenario 2

Transport Layer

3-23

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

lin original dataloutlin original data plus

retransmitted data

copy

free buffer space

R2

R2

l out

lin

Causescosts of congestion scenario 2

Host B

A

Transport Layer

3-24

lin original dataloutlin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer

3-25

lin original dataloutlin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2lin

l out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer

3-26

A

lin loutlincopy

free buffer space

timeout

R2

R2lin

l out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer

3-27

R2

l out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2lin

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer

3-28

four sendersmultihop paths timeoutretransmit

Q what happens as lin and lin

rsquo increase

finite shared output link buffers

Host A lout

Causescosts of congestion scenario 3

Host B

Host CHost D

lin original data

lin original data plus retransmitted data

A as red linrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput g 0

Transport Layer

3-29

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

l out

linrsquo

Transport Layer

3-30

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end congestion

controlno explicit

feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer

3-31Case study ATM ABR congestion

controlABR available bit rateldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should use

available bandwidth

if senderrsquos path congested sender throttled to

minimum guaranteed rate

RM (resource management) cellssent by sender interspersed

with data cellsbits in RM cell set by switches

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild

congestion) CI bit congestion indication

RM cells returned to sender by receiver with bits intact

Transport Layer

3-32Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in

returned RM cell

RM cell data cell

Transport Layer

3-33

TCP congestion control

Transport Layer

3-34

TCP congestion control additive increase multiplicative decrease approach sender increases transmission

rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half

after loss

cwnd

TCP

sen

der

cong

estio

n w

indo

w s

ize

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer

3-35TCP Congestion Control details

sender limits transmission

cwnd is dynamic function of perceived network congestion

TCP sending rate

roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte

ACKed sent not-yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent-LastByteAcked

lt cwnd

sender sequence number space

rate~~cwndRTT bytessec

Transport Layer

3-36TCP Slow Start

when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for

every ACK received

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer

3-37

TCP detecting reacting to loss

loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer

3-38

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementationvariable ssthresh on loss event ssthresh is

set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer

3-39

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer

3-40TCP throughput

avg TCP thruput as function of window size RTTignore slow start assume always data to

sendW window size (measured in bytes) where loss

occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW

W2

avg TCP thruput = 34WRTTbytessec

Transport Layer

3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want

10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L

[Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer

3-42

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP Fairness

TCP connection 2

Transport Layer

3-43

Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout

increasesmultiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughputCon

nec t

ion

2 t h

roug

hput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer

3-44

Fairness (more)Fairness and UDP

multimedia apps often do not use TCP do not want rate throttled

by congestion control

instead use UDP send audiovideo at

constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connectionsnew app asks for 1 TCP gets

rate R10new app asks for 11 TCPs

gets R2

  • Congestion in networks
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (2)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • TCP fast retransmit (2)
  • TCP flow control
  • TCP flow control (2)
  • Connection Management
  • Agreeing to establish a connection
  • Agreeing to establish a connection (2)
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • TCP closing a connection (2)
  • Principles of congestion control
  • Principles of congestion control (2)
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 2 (3)
  • Causescosts of congestion scenario 2 (4)
  • Causescosts of congestion scenario 2 (5)
  • Causescosts of congestion scenario 2 (6)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Case study ATM ABR congestion control (2)
  • TCP congestion control
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 3: Congestion in networks

Transport Layer

3-3

TCP sender eventsdata rcvd from app

create segment with seq

seq is byte-stream number of first data byte in segment

start timer if not already running think of timer as for oldest

unacked segment expiration interval TimeOutInterval

timeout

retransmit segment that caused timeout

restart timer ack rcvd

if ack acknowledges previously unacked segments update what is known to

be ACKed start timer if there are still

unacked segments

Transport Layer

3-4TCP sender (simplified)

waitfor event

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

L

create segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running) start timer

data received from application above

retransmit not-yet-acked segment with smallest seq start timer

timeout

if (y gt SendBase) SendBase = y SendBasendash1 last cumulatively ACKed byte if (there are currently not-yet-acked segments) start timer else stop timer

ACK received with ACK field value y

Transport Layer

3-5TCP retransmission scenarios

lost ACK scenario

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8 bytes of data

Xtimeo

ut

ACK=100

premature timeout

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8bytes of data

timeo

utACK=120

Seq=100 20 bytes of data

ACK=120

SendBase=100SendBase=120

SendBase=120

SendBase=92

Transport Layer

3-6TCP retransmission scenarios

X

cumulative ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=120 15 bytes of data

timeo

ut

Seq=100 20 bytes of data

ACK=120

Transport Layer

3-7TCP ACK generation [RFC 1122 RFC 2581]

event at receiver

arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

arrival of in-order segment withexpected seq One other segment has ACK pending

arrival of out-of-order segmenthigher-than-expect seq Gap detected

arrival of segment that partially or completely fills gap

TCP receiver action

delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

immediately send single cumulative ACK ACKing both in-order segments

immediately send duplicate ACK indicating seq of next expected byte

immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer

3-8TCP fast retransmittime-out period often

relatively long long delay before

resending lost packet

detect lost segments via duplicate ACKs sender often sends many

segments back-to-back if segment is lost there will

likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unacked segment with smallest seq likely that unacked

segment lost so donrsquot wait for timeout

TCP fast retransmit

(ldquotriple duplicate ACKsrdquo)

Transport Layer

3-9

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

timeo

ut

ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer

3-10

TCP flow controlapplicationprocess

TCP socketreceiver buffers

TCPcode

IPcode

applicationOS

receiver protocol stack

application may remove data from

TCP socket buffers hellip

hellip slower than TCP receiver is delivering(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer

3-11

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application processreceiver ldquoadvertisesrdquo free

buffer space by including rwnd value in TCP header of receiver-to-sender segmentsRcvBuffer size set via socket

options (typical default is 4096 bytes)

many operating systems autoadjust RcvBuffer

sender limits amount of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value

guarantees receive buffer will not overflow

receiver-side buffering

Transport Layer

3-12Connection Managementbefore exchanging data senderreceiver

ldquohandshakerdquoagree to establish connection (each knowing the

other willing to establish connection)agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer

3-13

Q will 2-way handshake always work in network

variable delaysretransmitted messages (eg

req_conn(x)) due to message lossmessage reorderingcanrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talkOK ESTAB

ESTAB

choose x req_conn(x)ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer

3-14

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

esserverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose x req_conn(x)ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose x req_conn(x)ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer

3-15TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data received ACK(y)

indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTENserver state

LISTEN

Transport Layer

3-16

TCP 3-way handshake FSM

closed

L

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)SYNACK(seq=yACKnum=x+1)create new socket for communication back to client

SYNACK(seq=yACKnum=x+1)ACK(ACKnum=y+1)ACK(ACKnum=y+1)

L

Transport Layer

3-17TCP closing a connection

client server each close their side of connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN

simultaneous FIN exchanges can be handled

Transport Layer

3-18

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

closecan stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

Transport Layer

3-19Principles of congestion control

Transport Layer

3-20

congestion informally ldquotoo many sources sending too much data too

fast for network to handlerdquodifferent from flow controlmanifestationslost packets (buffer overflow at routers)long delays (queueing in router

buffers)a top-10 problem

Principles of congestion control

Transport Layer

3-21

Causescosts of congestion scenario 1 two senders two receiversone router infinite buffers output link capacity Rno retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data lin

Host B

throughput lout

R2

R2

l out

lin R2de

lay

lin large delays as arrival

rate lin approaches capacity

Transport Layer

3-22

one router finite buffers sender retransmission of timed-out packet application-layer input = application-layer output lin = lout

transport-layer input includes retransmissions lin lin

finite shared output link buffers

Host A

lin original data

Host B

loutlin original data plus retransmitted data

Causescosts of congestion scenario 2

Transport Layer

3-23

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

lin original dataloutlin original data plus

retransmitted data

copy

free buffer space

R2

R2

l out

lin

Causescosts of congestion scenario 2

Host B

A

Transport Layer

3-24

lin original dataloutlin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer

3-25

lin original dataloutlin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2lin

l out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer

3-26

A

lin loutlincopy

free buffer space

timeout

R2

R2lin

l out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer

3-27

R2

l out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2lin

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer

3-28

four sendersmultihop paths timeoutretransmit

Q what happens as lin and lin

rsquo increase

finite shared output link buffers

Host A lout

Causescosts of congestion scenario 3

Host B

Host CHost D

lin original data

lin original data plus retransmitted data

A as red linrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput g 0

Transport Layer

3-29

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

l out

linrsquo

Transport Layer

3-30

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end congestion

controlno explicit

feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer

3-31Case study ATM ABR congestion

controlABR available bit rateldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should use

available bandwidth

if senderrsquos path congested sender throttled to

minimum guaranteed rate

RM (resource management) cellssent by sender interspersed

with data cellsbits in RM cell set by switches

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild

congestion) CI bit congestion indication

RM cells returned to sender by receiver with bits intact

Transport Layer

3-32Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in

returned RM cell

RM cell data cell

Transport Layer

3-33

TCP congestion control

Transport Layer

3-34

TCP congestion control additive increase multiplicative decrease approach sender increases transmission

rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half

after loss

cwnd

TCP

sen

der

cong

estio

n w

indo

w s

ize

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer

3-35TCP Congestion Control details

sender limits transmission

cwnd is dynamic function of perceived network congestion

TCP sending rate

roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte

ACKed sent not-yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent-LastByteAcked

lt cwnd

sender sequence number space

rate~~cwndRTT bytessec

Transport Layer

3-36TCP Slow Start

when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for

every ACK received

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer

3-37

TCP detecting reacting to loss

loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer

3-38

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementationvariable ssthresh on loss event ssthresh is

set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer

3-39

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer

3-40TCP throughput

avg TCP thruput as function of window size RTTignore slow start assume always data to

sendW window size (measured in bytes) where loss

occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW

W2

avg TCP thruput = 34WRTTbytessec

Transport Layer

3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want

10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L

[Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer

3-42

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP Fairness

TCP connection 2

Transport Layer

3-43

Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout

increasesmultiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughputCon

nec t

ion

2 t h

roug

hput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer

3-44

Fairness (more)Fairness and UDP

multimedia apps often do not use TCP do not want rate throttled

by congestion control

instead use UDP send audiovideo at

constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connectionsnew app asks for 1 TCP gets

rate R10new app asks for 11 TCPs

gets R2

  • Congestion in networks
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (2)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • TCP fast retransmit (2)
  • TCP flow control
  • TCP flow control (2)
  • Connection Management
  • Agreeing to establish a connection
  • Agreeing to establish a connection (2)
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • TCP closing a connection (2)
  • Principles of congestion control
  • Principles of congestion control (2)
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 2 (3)
  • Causescosts of congestion scenario 2 (4)
  • Causescosts of congestion scenario 2 (5)
  • Causescosts of congestion scenario 2 (6)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Case study ATM ABR congestion control (2)
  • TCP congestion control
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 4: Congestion in networks

Transport Layer

3-4TCP sender (simplified)

waitfor event

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

L

create segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running) start timer

data received from application above

retransmit not-yet-acked segment with smallest seq start timer

timeout

if (y gt SendBase) SendBase = y SendBasendash1 last cumulatively ACKed byte if (there are currently not-yet-acked segments) start timer else stop timer

ACK received with ACK field value y

Transport Layer

3-5TCP retransmission scenarios

lost ACK scenario

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8 bytes of data

Xtimeo

ut

ACK=100

premature timeout

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8bytes of data

timeo

utACK=120

Seq=100 20 bytes of data

ACK=120

SendBase=100SendBase=120

SendBase=120

SendBase=92

Transport Layer

3-6TCP retransmission scenarios

X

cumulative ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=120 15 bytes of data

timeo

ut

Seq=100 20 bytes of data

ACK=120

Transport Layer

3-7TCP ACK generation [RFC 1122 RFC 2581]

event at receiver

arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

arrival of in-order segment withexpected seq One other segment has ACK pending

arrival of out-of-order segmenthigher-than-expect seq Gap detected

arrival of segment that partially or completely fills gap

TCP receiver action

delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

immediately send single cumulative ACK ACKing both in-order segments

immediately send duplicate ACK indicating seq of next expected byte

immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer

3-8TCP fast retransmittime-out period often

relatively long long delay before

resending lost packet

detect lost segments via duplicate ACKs sender often sends many

segments back-to-back if segment is lost there will

likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unacked segment with smallest seq likely that unacked

segment lost so donrsquot wait for timeout

TCP fast retransmit

(ldquotriple duplicate ACKsrdquo)

Transport Layer

3-9

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

timeo

ut

ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer

3-10

TCP flow controlapplicationprocess

TCP socketreceiver buffers

TCPcode

IPcode

applicationOS

receiver protocol stack

application may remove data from

TCP socket buffers hellip

hellip slower than TCP receiver is delivering(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer

3-11

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application processreceiver ldquoadvertisesrdquo free

buffer space by including rwnd value in TCP header of receiver-to-sender segmentsRcvBuffer size set via socket

options (typical default is 4096 bytes)

many operating systems autoadjust RcvBuffer

sender limits amount of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value

guarantees receive buffer will not overflow

receiver-side buffering

Transport Layer

3-12Connection Managementbefore exchanging data senderreceiver

ldquohandshakerdquoagree to establish connection (each knowing the

other willing to establish connection)agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer

3-13

Q will 2-way handshake always work in network

variable delaysretransmitted messages (eg

req_conn(x)) due to message lossmessage reorderingcanrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talkOK ESTAB

ESTAB

choose x req_conn(x)ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer

3-14

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

esserverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose x req_conn(x)ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose x req_conn(x)ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer

3-15TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data received ACK(y)

indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTENserver state

LISTEN

Transport Layer

3-16

TCP 3-way handshake FSM

closed

L

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)SYNACK(seq=yACKnum=x+1)create new socket for communication back to client

SYNACK(seq=yACKnum=x+1)ACK(ACKnum=y+1)ACK(ACKnum=y+1)

L

Transport Layer

3-17TCP closing a connection

client server each close their side of connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN

simultaneous FIN exchanges can be handled

Transport Layer

3-18

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

closecan stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

Transport Layer

3-19Principles of congestion control

Transport Layer

3-20

congestion informally ldquotoo many sources sending too much data too

fast for network to handlerdquodifferent from flow controlmanifestationslost packets (buffer overflow at routers)long delays (queueing in router

buffers)a top-10 problem

Principles of congestion control

Transport Layer

3-21

Causescosts of congestion scenario 1 two senders two receiversone router infinite buffers output link capacity Rno retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data lin

Host B

throughput lout

R2

R2

l out

lin R2de

lay

lin large delays as arrival

rate lin approaches capacity

Transport Layer

3-22

one router finite buffers sender retransmission of timed-out packet application-layer input = application-layer output lin = lout

transport-layer input includes retransmissions lin lin

finite shared output link buffers

Host A

lin original data

Host B

loutlin original data plus retransmitted data

Causescosts of congestion scenario 2

Transport Layer

3-23

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

lin original dataloutlin original data plus

retransmitted data

copy

free buffer space

R2

R2

l out

lin

Causescosts of congestion scenario 2

Host B

A

Transport Layer

3-24

lin original dataloutlin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer

3-25

lin original dataloutlin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2lin

l out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer

3-26

A

lin loutlincopy

free buffer space

timeout

R2

R2lin

l out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer

3-27

R2

l out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2lin

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer

3-28

four sendersmultihop paths timeoutretransmit

Q what happens as lin and lin

rsquo increase

finite shared output link buffers

Host A lout

Causescosts of congestion scenario 3

Host B

Host CHost D

lin original data

lin original data plus retransmitted data

A as red linrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput g 0

Transport Layer

3-29

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

l out

linrsquo

Transport Layer

3-30

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end congestion

controlno explicit

feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer

3-31Case study ATM ABR congestion

controlABR available bit rateldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should use

available bandwidth

if senderrsquos path congested sender throttled to

minimum guaranteed rate

RM (resource management) cellssent by sender interspersed

with data cellsbits in RM cell set by switches

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild

congestion) CI bit congestion indication

RM cells returned to sender by receiver with bits intact

Transport Layer

3-32Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in

returned RM cell

RM cell data cell

Transport Layer

3-33

TCP congestion control

Transport Layer

3-34

TCP congestion control additive increase multiplicative decrease approach sender increases transmission

rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half

after loss

cwnd

TCP

sen

der

cong

estio

n w

indo

w s

ize

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer

3-35TCP Congestion Control details

sender limits transmission

cwnd is dynamic function of perceived network congestion

TCP sending rate

roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte

ACKed sent not-yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent-LastByteAcked

lt cwnd

sender sequence number space

rate~~cwndRTT bytessec

Transport Layer

3-36TCP Slow Start

when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for

every ACK received

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer

3-37

TCP detecting reacting to loss

loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer

3-38

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementationvariable ssthresh on loss event ssthresh is

set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer

3-39

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer

3-40TCP throughput

avg TCP thruput as function of window size RTTignore slow start assume always data to

sendW window size (measured in bytes) where loss

occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW

W2

avg TCP thruput = 34WRTTbytessec

Transport Layer

3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want

10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L

[Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer

3-42

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP Fairness

TCP connection 2

Transport Layer

3-43

Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout

increasesmultiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughputCon

nec t

ion

2 t h

roug

hput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer

3-44

Fairness (more)Fairness and UDP

multimedia apps often do not use TCP do not want rate throttled

by congestion control

instead use UDP send audiovideo at

constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connectionsnew app asks for 1 TCP gets

rate R10new app asks for 11 TCPs

gets R2

  • Congestion in networks
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (2)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • TCP fast retransmit (2)
  • TCP flow control
  • TCP flow control (2)
  • Connection Management
  • Agreeing to establish a connection
  • Agreeing to establish a connection (2)
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • TCP closing a connection (2)
  • Principles of congestion control
  • Principles of congestion control (2)
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 2 (3)
  • Causescosts of congestion scenario 2 (4)
  • Causescosts of congestion scenario 2 (5)
  • Causescosts of congestion scenario 2 (6)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Case study ATM ABR congestion control (2)
  • TCP congestion control
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 5: Congestion in networks

Transport Layer

3-5TCP retransmission scenarios

lost ACK scenario

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8 bytes of data

Xtimeo

ut

ACK=100

premature timeout

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8bytes of data

timeo

utACK=120

Seq=100 20 bytes of data

ACK=120

SendBase=100SendBase=120

SendBase=120

SendBase=92

Transport Layer

3-6TCP retransmission scenarios

X

cumulative ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=120 15 bytes of data

timeo

ut

Seq=100 20 bytes of data

ACK=120

Transport Layer

3-7TCP ACK generation [RFC 1122 RFC 2581]

event at receiver

arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

arrival of in-order segment withexpected seq One other segment has ACK pending

arrival of out-of-order segmenthigher-than-expect seq Gap detected

arrival of segment that partially or completely fills gap

TCP receiver action

delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

immediately send single cumulative ACK ACKing both in-order segments

immediately send duplicate ACK indicating seq of next expected byte

immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer

3-8TCP fast retransmittime-out period often

relatively long long delay before

resending lost packet

detect lost segments via duplicate ACKs sender often sends many

segments back-to-back if segment is lost there will

likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unacked segment with smallest seq likely that unacked

segment lost so donrsquot wait for timeout

TCP fast retransmit

(ldquotriple duplicate ACKsrdquo)

Transport Layer

3-9

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

timeo

ut

ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer

3-10

TCP flow controlapplicationprocess

TCP socketreceiver buffers

TCPcode

IPcode

applicationOS

receiver protocol stack

application may remove data from

TCP socket buffers hellip

hellip slower than TCP receiver is delivering(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer

3-11

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application processreceiver ldquoadvertisesrdquo free

buffer space by including rwnd value in TCP header of receiver-to-sender segmentsRcvBuffer size set via socket

options (typical default is 4096 bytes)

many operating systems autoadjust RcvBuffer

sender limits amount of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value

guarantees receive buffer will not overflow

receiver-side buffering

Transport Layer

3-12Connection Managementbefore exchanging data senderreceiver

ldquohandshakerdquoagree to establish connection (each knowing the

other willing to establish connection)agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer

3-13

Q will 2-way handshake always work in network

variable delaysretransmitted messages (eg

req_conn(x)) due to message lossmessage reorderingcanrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talkOK ESTAB

ESTAB

choose x req_conn(x)ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer

3-14

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

esserverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose x req_conn(x)ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose x req_conn(x)ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer

3-15TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data received ACK(y)

indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTENserver state

LISTEN

Transport Layer

3-16

TCP 3-way handshake FSM

closed

L

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)SYNACK(seq=yACKnum=x+1)create new socket for communication back to client

SYNACK(seq=yACKnum=x+1)ACK(ACKnum=y+1)ACK(ACKnum=y+1)

L

Transport Layer

3-17TCP closing a connection

client server each close their side of connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN

simultaneous FIN exchanges can be handled

Transport Layer

3-18

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

closecan stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

Transport Layer

3-19Principles of congestion control

Transport Layer

3-20

congestion informally ldquotoo many sources sending too much data too

fast for network to handlerdquodifferent from flow controlmanifestationslost packets (buffer overflow at routers)long delays (queueing in router

buffers)a top-10 problem

Principles of congestion control

Transport Layer

3-21

Causescosts of congestion scenario 1 two senders two receiversone router infinite buffers output link capacity Rno retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data lin

Host B

throughput lout

R2

R2

l out

lin R2de

lay

lin large delays as arrival

rate lin approaches capacity

Transport Layer

3-22

one router finite buffers sender retransmission of timed-out packet application-layer input = application-layer output lin = lout

transport-layer input includes retransmissions lin lin

finite shared output link buffers

Host A

lin original data

Host B

loutlin original data plus retransmitted data

Causescosts of congestion scenario 2

Transport Layer

3-23

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

lin original dataloutlin original data plus

retransmitted data

copy

free buffer space

R2

R2

l out

lin

Causescosts of congestion scenario 2

Host B

A

Transport Layer

3-24

lin original dataloutlin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer

3-25

lin original dataloutlin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2lin

l out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer

3-26

A

lin loutlincopy

free buffer space

timeout

R2

R2lin

l out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer

3-27

R2

l out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2lin

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer

3-28

four sendersmultihop paths timeoutretransmit

Q what happens as lin and lin

rsquo increase

finite shared output link buffers

Host A lout

Causescosts of congestion scenario 3

Host B

Host CHost D

lin original data

lin original data plus retransmitted data

A as red linrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput g 0

Transport Layer

3-29

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

l out

linrsquo

Transport Layer

3-30

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end congestion

controlno explicit

feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer

3-31Case study ATM ABR congestion

controlABR available bit rateldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should use

available bandwidth

if senderrsquos path congested sender throttled to

minimum guaranteed rate

RM (resource management) cellssent by sender interspersed

with data cellsbits in RM cell set by switches

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild

congestion) CI bit congestion indication

RM cells returned to sender by receiver with bits intact

Transport Layer

3-32Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in

returned RM cell

RM cell data cell

Transport Layer

3-33

TCP congestion control

Transport Layer

3-34

TCP congestion control additive increase multiplicative decrease approach sender increases transmission

rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half

after loss

cwnd

TCP

sen

der

cong

estio

n w

indo

w s

ize

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer

3-35TCP Congestion Control details

sender limits transmission

cwnd is dynamic function of perceived network congestion

TCP sending rate

roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte

ACKed sent not-yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent-LastByteAcked

lt cwnd

sender sequence number space

rate~~cwndRTT bytessec

Transport Layer

3-36TCP Slow Start

when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for

every ACK received

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer

3-37

TCP detecting reacting to loss

loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer

3-38

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementationvariable ssthresh on loss event ssthresh is

set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer

3-39

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer

3-40TCP throughput

avg TCP thruput as function of window size RTTignore slow start assume always data to

sendW window size (measured in bytes) where loss

occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW

W2

avg TCP thruput = 34WRTTbytessec

Transport Layer

3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want

10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L

[Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer

3-42

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP Fairness

TCP connection 2

Transport Layer

3-43

Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout

increasesmultiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughputCon

nec t

ion

2 t h

roug

hput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer

3-44

Fairness (more)Fairness and UDP

multimedia apps often do not use TCP do not want rate throttled

by congestion control

instead use UDP send audiovideo at

constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connectionsnew app asks for 1 TCP gets

rate R10new app asks for 11 TCPs

gets R2

  • Congestion in networks
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (2)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • TCP fast retransmit (2)
  • TCP flow control
  • TCP flow control (2)
  • Connection Management
  • Agreeing to establish a connection
  • Agreeing to establish a connection (2)
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • TCP closing a connection (2)
  • Principles of congestion control
  • Principles of congestion control (2)
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 2 (3)
  • Causescosts of congestion scenario 2 (4)
  • Causescosts of congestion scenario 2 (5)
  • Causescosts of congestion scenario 2 (6)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Case study ATM ABR congestion control (2)
  • TCP congestion control
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 6: Congestion in networks

Transport Layer

3-6TCP retransmission scenarios

X

cumulative ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=120 15 bytes of data

timeo

ut

Seq=100 20 bytes of data

ACK=120

Transport Layer

3-7TCP ACK generation [RFC 1122 RFC 2581]

event at receiver

arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

arrival of in-order segment withexpected seq One other segment has ACK pending

arrival of out-of-order segmenthigher-than-expect seq Gap detected

arrival of segment that partially or completely fills gap

TCP receiver action

delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

immediately send single cumulative ACK ACKing both in-order segments

immediately send duplicate ACK indicating seq of next expected byte

immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer

3-8TCP fast retransmittime-out period often

relatively long long delay before

resending lost packet

detect lost segments via duplicate ACKs sender often sends many

segments back-to-back if segment is lost there will

likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unacked segment with smallest seq likely that unacked

segment lost so donrsquot wait for timeout

TCP fast retransmit

(ldquotriple duplicate ACKsrdquo)

Transport Layer

3-9

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

timeo

ut

ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer

3-10

TCP flow controlapplicationprocess

TCP socketreceiver buffers

TCPcode

IPcode

applicationOS

receiver protocol stack

application may remove data from

TCP socket buffers hellip

hellip slower than TCP receiver is delivering(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer

3-11

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application processreceiver ldquoadvertisesrdquo free

buffer space by including rwnd value in TCP header of receiver-to-sender segmentsRcvBuffer size set via socket

options (typical default is 4096 bytes)

many operating systems autoadjust RcvBuffer

sender limits amount of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value

guarantees receive buffer will not overflow

receiver-side buffering

Transport Layer

3-12Connection Managementbefore exchanging data senderreceiver

ldquohandshakerdquoagree to establish connection (each knowing the

other willing to establish connection)agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer

3-13

Q will 2-way handshake always work in network

variable delaysretransmitted messages (eg

req_conn(x)) due to message lossmessage reorderingcanrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talkOK ESTAB

ESTAB

choose x req_conn(x)ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer

3-14

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

esserverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose x req_conn(x)ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose x req_conn(x)ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer

3-15TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data received ACK(y)

indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTENserver state

LISTEN

Transport Layer

3-16

TCP 3-way handshake FSM

closed

L

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)SYNACK(seq=yACKnum=x+1)create new socket for communication back to client

SYNACK(seq=yACKnum=x+1)ACK(ACKnum=y+1)ACK(ACKnum=y+1)

L

Transport Layer

3-17TCP closing a connection

client server each close their side of connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN

simultaneous FIN exchanges can be handled

Transport Layer

3-18

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

closecan stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

Transport Layer

3-19Principles of congestion control

Transport Layer

3-20

congestion informally ldquotoo many sources sending too much data too

fast for network to handlerdquodifferent from flow controlmanifestationslost packets (buffer overflow at routers)long delays (queueing in router

buffers)a top-10 problem

Principles of congestion control

Transport Layer

3-21

Causescosts of congestion scenario 1 two senders two receiversone router infinite buffers output link capacity Rno retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data lin

Host B

throughput lout

R2

R2

l out

lin R2de

lay

lin large delays as arrival

rate lin approaches capacity

Transport Layer

3-22

one router finite buffers sender retransmission of timed-out packet application-layer input = application-layer output lin = lout

transport-layer input includes retransmissions lin lin

finite shared output link buffers

Host A

lin original data

Host B

loutlin original data plus retransmitted data

Causescosts of congestion scenario 2

Transport Layer

3-23

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

lin original dataloutlin original data plus

retransmitted data

copy

free buffer space

R2

R2

l out

lin

Causescosts of congestion scenario 2

Host B

A

Transport Layer

3-24

lin original dataloutlin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer

3-25

lin original dataloutlin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2lin

l out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer

3-26

A

lin loutlincopy

free buffer space

timeout

R2

R2lin

l out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer

3-27

R2

l out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2lin

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer

3-28

four sendersmultihop paths timeoutretransmit

Q what happens as lin and lin

rsquo increase

finite shared output link buffers

Host A lout

Causescosts of congestion scenario 3

Host B

Host CHost D

lin original data

lin original data plus retransmitted data

A as red linrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput g 0

Transport Layer

3-29

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

l out

linrsquo

Transport Layer

3-30

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end congestion

controlno explicit

feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer

3-31Case study ATM ABR congestion

controlABR available bit rateldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should use

available bandwidth

if senderrsquos path congested sender throttled to

minimum guaranteed rate

RM (resource management) cellssent by sender interspersed

with data cellsbits in RM cell set by switches

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild

congestion) CI bit congestion indication

RM cells returned to sender by receiver with bits intact

Transport Layer

3-32Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in

returned RM cell

RM cell data cell

Transport Layer

3-33

TCP congestion control

Transport Layer

3-34

TCP congestion control additive increase multiplicative decrease approach sender increases transmission

rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half

after loss

cwnd

TCP

sen

der

cong

estio

n w

indo

w s

ize

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer

3-35TCP Congestion Control details

sender limits transmission

cwnd is dynamic function of perceived network congestion

TCP sending rate

roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte

ACKed sent not-yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent-LastByteAcked

lt cwnd

sender sequence number space

rate~~cwndRTT bytessec

Transport Layer

3-36TCP Slow Start

when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for

every ACK received

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer

3-37

TCP detecting reacting to loss

loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer

3-38

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementationvariable ssthresh on loss event ssthresh is

set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer

3-39

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer

3-40TCP throughput

avg TCP thruput as function of window size RTTignore slow start assume always data to

sendW window size (measured in bytes) where loss

occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW

W2

avg TCP thruput = 34WRTTbytessec

Transport Layer

3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want

10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L

[Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer

3-42

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP Fairness

TCP connection 2

Transport Layer

3-43

Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout

increasesmultiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughputCon

nec t

ion

2 t h

roug

hput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer

3-44

Fairness (more)Fairness and UDP

multimedia apps often do not use TCP do not want rate throttled

by congestion control

instead use UDP send audiovideo at

constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connectionsnew app asks for 1 TCP gets

rate R10new app asks for 11 TCPs

gets R2

  • Congestion in networks
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (2)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • TCP fast retransmit (2)
  • TCP flow control
  • TCP flow control (2)
  • Connection Management
  • Agreeing to establish a connection
  • Agreeing to establish a connection (2)
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • TCP closing a connection (2)
  • Principles of congestion control
  • Principles of congestion control (2)
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 2 (3)
  • Causescosts of congestion scenario 2 (4)
  • Causescosts of congestion scenario 2 (5)
  • Causescosts of congestion scenario 2 (6)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Case study ATM ABR congestion control (2)
  • TCP congestion control
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 7: Congestion in networks

Transport Layer

3-7TCP ACK generation [RFC 1122 RFC 2581]

event at receiver

arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

arrival of in-order segment withexpected seq One other segment has ACK pending

arrival of out-of-order segmenthigher-than-expect seq Gap detected

arrival of segment that partially or completely fills gap

TCP receiver action

delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

immediately send single cumulative ACK ACKing both in-order segments

immediately send duplicate ACK indicating seq of next expected byte

immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer

3-8TCP fast retransmittime-out period often

relatively long long delay before

resending lost packet

detect lost segments via duplicate ACKs sender often sends many

segments back-to-back if segment is lost there will

likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unacked segment with smallest seq likely that unacked

segment lost so donrsquot wait for timeout

TCP fast retransmit

(ldquotriple duplicate ACKsrdquo)

Transport Layer

3-9

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

timeo

ut

ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer

3-10

TCP flow controlapplicationprocess

TCP socketreceiver buffers

TCPcode

IPcode

applicationOS

receiver protocol stack

application may remove data from

TCP socket buffers hellip

hellip slower than TCP receiver is delivering(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer

3-11

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application processreceiver ldquoadvertisesrdquo free

buffer space by including rwnd value in TCP header of receiver-to-sender segmentsRcvBuffer size set via socket

options (typical default is 4096 bytes)

many operating systems autoadjust RcvBuffer

sender limits amount of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value

guarantees receive buffer will not overflow

receiver-side buffering

Transport Layer

3-12Connection Managementbefore exchanging data senderreceiver

ldquohandshakerdquoagree to establish connection (each knowing the

other willing to establish connection)agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer

3-13

Q will 2-way handshake always work in network

variable delaysretransmitted messages (eg

req_conn(x)) due to message lossmessage reorderingcanrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talkOK ESTAB

ESTAB

choose x req_conn(x)ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer

3-14

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

esserverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose x req_conn(x)ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose x req_conn(x)ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer

3-15TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data received ACK(y)

indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTENserver state

LISTEN

Transport Layer

3-16

TCP 3-way handshake FSM

closed

L

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)SYNACK(seq=yACKnum=x+1)create new socket for communication back to client

SYNACK(seq=yACKnum=x+1)ACK(ACKnum=y+1)ACK(ACKnum=y+1)

L

Transport Layer

3-17TCP closing a connection

client server each close their side of connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN

simultaneous FIN exchanges can be handled

Transport Layer

3-18

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

closecan stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

Transport Layer

3-19Principles of congestion control

Transport Layer

3-20

congestion informally ldquotoo many sources sending too much data too

fast for network to handlerdquodifferent from flow controlmanifestationslost packets (buffer overflow at routers)long delays (queueing in router

buffers)a top-10 problem

Principles of congestion control

Transport Layer

3-21

Causescosts of congestion scenario 1 two senders two receiversone router infinite buffers output link capacity Rno retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data lin

Host B

throughput lout

R2

R2

l out

lin R2de

lay

lin large delays as arrival

rate lin approaches capacity

Transport Layer

3-22

one router finite buffers sender retransmission of timed-out packet application-layer input = application-layer output lin = lout

transport-layer input includes retransmissions lin lin

finite shared output link buffers

Host A

lin original data

Host B

loutlin original data plus retransmitted data

Causescosts of congestion scenario 2

Transport Layer

3-23

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

lin original dataloutlin original data plus

retransmitted data

copy

free buffer space

R2

R2

l out

lin

Causescosts of congestion scenario 2

Host B

A

Transport Layer

3-24

lin original dataloutlin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer

3-25

lin original dataloutlin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2lin

l out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer

3-26

A

lin loutlincopy

free buffer space

timeout

R2

R2lin

l out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer

3-27

R2

l out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2lin

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer

3-28

four sendersmultihop paths timeoutretransmit

Q what happens as lin and lin

rsquo increase

finite shared output link buffers

Host A lout

Causescosts of congestion scenario 3

Host B

Host CHost D

lin original data

lin original data plus retransmitted data

A as red linrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput g 0

Transport Layer

3-29

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

l out

linrsquo

Transport Layer

3-30

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end congestion

controlno explicit

feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer

3-31Case study ATM ABR congestion

controlABR available bit rateldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should use

available bandwidth

if senderrsquos path congested sender throttled to

minimum guaranteed rate

RM (resource management) cellssent by sender interspersed

with data cellsbits in RM cell set by switches

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild

congestion) CI bit congestion indication

RM cells returned to sender by receiver with bits intact

Transport Layer

3-32Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in

returned RM cell

RM cell data cell

Transport Layer

3-33

TCP congestion control

Transport Layer

3-34

TCP congestion control additive increase multiplicative decrease approach sender increases transmission

rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half

after loss

cwnd

TCP

sen

der

cong

estio

n w

indo

w s

ize

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer

3-35TCP Congestion Control details

sender limits transmission

cwnd is dynamic function of perceived network congestion

TCP sending rate

roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte

ACKed sent not-yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent-LastByteAcked

lt cwnd

sender sequence number space

rate~~cwndRTT bytessec

Transport Layer

3-36TCP Slow Start

when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for

every ACK received

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer

3-37

TCP detecting reacting to loss

loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer

3-38

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementationvariable ssthresh on loss event ssthresh is

set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer

3-39

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer

3-40TCP throughput

avg TCP thruput as function of window size RTTignore slow start assume always data to

sendW window size (measured in bytes) where loss

occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW

W2

avg TCP thruput = 34WRTTbytessec

Transport Layer

3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want

10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L

[Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer

3-42

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP Fairness

TCP connection 2

Transport Layer

3-43

Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout

increasesmultiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughputCon

nec t

ion

2 t h

roug

hput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer

3-44

Fairness (more)Fairness and UDP

multimedia apps often do not use TCP do not want rate throttled

by congestion control

instead use UDP send audiovideo at

constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connectionsnew app asks for 1 TCP gets

rate R10new app asks for 11 TCPs

gets R2

  • Congestion in networks
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (2)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • TCP fast retransmit (2)
  • TCP flow control
  • TCP flow control (2)
  • Connection Management
  • Agreeing to establish a connection
  • Agreeing to establish a connection (2)
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • TCP closing a connection (2)
  • Principles of congestion control
  • Principles of congestion control (2)
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 2 (3)
  • Causescosts of congestion scenario 2 (4)
  • Causescosts of congestion scenario 2 (5)
  • Causescosts of congestion scenario 2 (6)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Case study ATM ABR congestion control (2)
  • TCP congestion control
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 8: Congestion in networks

Transport Layer

3-8TCP fast retransmittime-out period often

relatively long long delay before

resending lost packet

detect lost segments via duplicate ACKs sender often sends many

segments back-to-back if segment is lost there will

likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unacked segment with smallest seq likely that unacked

segment lost so donrsquot wait for timeout

TCP fast retransmit

(ldquotriple duplicate ACKsrdquo)

Transport Layer

3-9

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

timeo

ut

ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer

3-10

TCP flow controlapplicationprocess

TCP socketreceiver buffers

TCPcode

IPcode

applicationOS

receiver protocol stack

application may remove data from

TCP socket buffers hellip

hellip slower than TCP receiver is delivering(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer

3-11

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application processreceiver ldquoadvertisesrdquo free

buffer space by including rwnd value in TCP header of receiver-to-sender segmentsRcvBuffer size set via socket

options (typical default is 4096 bytes)

many operating systems autoadjust RcvBuffer

sender limits amount of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value

guarantees receive buffer will not overflow

receiver-side buffering

Transport Layer

3-12Connection Managementbefore exchanging data senderreceiver

ldquohandshakerdquoagree to establish connection (each knowing the

other willing to establish connection)agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer

3-13

Q will 2-way handshake always work in network

variable delaysretransmitted messages (eg

req_conn(x)) due to message lossmessage reorderingcanrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talkOK ESTAB

ESTAB

choose x req_conn(x)ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer

3-14

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

esserverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose x req_conn(x)ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose x req_conn(x)ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer

3-15TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data received ACK(y)

indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTENserver state

LISTEN

Transport Layer

3-16

TCP 3-way handshake FSM

closed

L

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)SYNACK(seq=yACKnum=x+1)create new socket for communication back to client

SYNACK(seq=yACKnum=x+1)ACK(ACKnum=y+1)ACK(ACKnum=y+1)

L

Transport Layer

3-17TCP closing a connection

client server each close their side of connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN

simultaneous FIN exchanges can be handled

Transport Layer

3-18

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

closecan stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

Transport Layer

3-19Principles of congestion control

Transport Layer

3-20

congestion informally ldquotoo many sources sending too much data too

fast for network to handlerdquodifferent from flow controlmanifestationslost packets (buffer overflow at routers)long delays (queueing in router

buffers)a top-10 problem

Principles of congestion control

Transport Layer

3-21

Causescosts of congestion scenario 1 two senders two receiversone router infinite buffers output link capacity Rno retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data lin

Host B

throughput lout

R2

R2

l out

lin R2de

lay

lin large delays as arrival

rate lin approaches capacity

Transport Layer

3-22

one router finite buffers sender retransmission of timed-out packet application-layer input = application-layer output lin = lout

transport-layer input includes retransmissions lin lin

finite shared output link buffers

Host A

lin original data

Host B

loutlin original data plus retransmitted data

Causescosts of congestion scenario 2

Transport Layer

3-23

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

lin original dataloutlin original data plus

retransmitted data

copy

free buffer space

R2

R2

l out

lin

Causescosts of congestion scenario 2

Host B

A

Transport Layer

3-24

lin original dataloutlin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer

3-25

lin original dataloutlin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2lin

l out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer

3-26

A

lin loutlincopy

free buffer space

timeout

R2

R2lin

l out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer

3-27

R2

l out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2lin

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer

3-28

four sendersmultihop paths timeoutretransmit

Q what happens as lin and lin

rsquo increase

finite shared output link buffers

Host A lout

Causescosts of congestion scenario 3

Host B

Host CHost D

lin original data

lin original data plus retransmitted data

A as red linrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput g 0

Transport Layer

3-29

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

l out

linrsquo

Transport Layer

3-30

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end congestion

controlno explicit

feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer

3-31Case study ATM ABR congestion

controlABR available bit rateldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should use

available bandwidth

if senderrsquos path congested sender throttled to

minimum guaranteed rate

RM (resource management) cellssent by sender interspersed

with data cellsbits in RM cell set by switches

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild

congestion) CI bit congestion indication

RM cells returned to sender by receiver with bits intact

Transport Layer

3-32Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in

returned RM cell

RM cell data cell

Transport Layer

3-33

TCP congestion control

Transport Layer

3-34

TCP congestion control additive increase multiplicative decrease approach sender increases transmission

rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half

after loss

cwnd

TCP

sen

der

cong

estio

n w

indo

w s

ize

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer

3-35TCP Congestion Control details

sender limits transmission

cwnd is dynamic function of perceived network congestion

TCP sending rate

roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte

ACKed sent not-yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent-LastByteAcked

lt cwnd

sender sequence number space

rate~~cwndRTT bytessec

Transport Layer

3-36TCP Slow Start

when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for

every ACK received

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer

3-37

TCP detecting reacting to loss

loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer

3-38

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementationvariable ssthresh on loss event ssthresh is

set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer

3-39

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer

3-40TCP throughput

avg TCP thruput as function of window size RTTignore slow start assume always data to

sendW window size (measured in bytes) where loss

occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW

W2

avg TCP thruput = 34WRTTbytessec

Transport Layer

3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want

10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L

[Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer

3-42

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP Fairness

TCP connection 2

Transport Layer

3-43

Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout

increasesmultiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughputCon

nec t

ion

2 t h

roug

hput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer

3-44

Fairness (more)Fairness and UDP

multimedia apps often do not use TCP do not want rate throttled

by congestion control

instead use UDP send audiovideo at

constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connectionsnew app asks for 1 TCP gets

rate R10new app asks for 11 TCPs

gets R2

  • Congestion in networks
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (2)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • TCP fast retransmit (2)
  • TCP flow control
  • TCP flow control (2)
  • Connection Management
  • Agreeing to establish a connection
  • Agreeing to establish a connection (2)
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • TCP closing a connection (2)
  • Principles of congestion control
  • Principles of congestion control (2)
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 2 (3)
  • Causescosts of congestion scenario 2 (4)
  • Causescosts of congestion scenario 2 (5)
  • Causescosts of congestion scenario 2 (6)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Case study ATM ABR congestion control (2)
  • TCP congestion control
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 9: Congestion in networks

Transport Layer

3-9

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

timeo

ut

ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer

3-10

TCP flow controlapplicationprocess

TCP socketreceiver buffers

TCPcode

IPcode

applicationOS

receiver protocol stack

application may remove data from

TCP socket buffers hellip

hellip slower than TCP receiver is delivering(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer

3-11

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application processreceiver ldquoadvertisesrdquo free

buffer space by including rwnd value in TCP header of receiver-to-sender segmentsRcvBuffer size set via socket

options (typical default is 4096 bytes)

many operating systems autoadjust RcvBuffer

sender limits amount of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value

guarantees receive buffer will not overflow

receiver-side buffering

Transport Layer

3-12Connection Managementbefore exchanging data senderreceiver

ldquohandshakerdquoagree to establish connection (each knowing the

other willing to establish connection)agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer

3-13

Q will 2-way handshake always work in network

variable delaysretransmitted messages (eg

req_conn(x)) due to message lossmessage reorderingcanrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talkOK ESTAB

ESTAB

choose x req_conn(x)ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer

3-14

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

esserverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose x req_conn(x)ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose x req_conn(x)ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer

3-15TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data received ACK(y)

indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTENserver state

LISTEN

Transport Layer

3-16

TCP 3-way handshake FSM

closed

L

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)SYNACK(seq=yACKnum=x+1)create new socket for communication back to client

SYNACK(seq=yACKnum=x+1)ACK(ACKnum=y+1)ACK(ACKnum=y+1)

L

Transport Layer

3-17TCP closing a connection

client server each close their side of connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN

simultaneous FIN exchanges can be handled

Transport Layer

3-18

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

closecan stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

Transport Layer

3-19Principles of congestion control

Transport Layer

3-20

congestion informally ldquotoo many sources sending too much data too

fast for network to handlerdquodifferent from flow controlmanifestationslost packets (buffer overflow at routers)long delays (queueing in router

buffers)a top-10 problem

Principles of congestion control

Transport Layer

3-21

Causescosts of congestion scenario 1 two senders two receiversone router infinite buffers output link capacity Rno retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data lin

Host B

throughput lout

R2

R2

l out

lin R2de

lay

lin large delays as arrival

rate lin approaches capacity

Transport Layer

3-22

one router finite buffers sender retransmission of timed-out packet application-layer input = application-layer output lin = lout

transport-layer input includes retransmissions lin lin

finite shared output link buffers

Host A

lin original data

Host B

loutlin original data plus retransmitted data

Causescosts of congestion scenario 2

Transport Layer

3-23

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

lin original dataloutlin original data plus

retransmitted data

copy

free buffer space

R2

R2

l out

lin

Causescosts of congestion scenario 2

Host B

A

Transport Layer

3-24

lin original dataloutlin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer

3-25

lin original dataloutlin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2lin

l out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer

3-26

A

lin loutlincopy

free buffer space

timeout

R2

R2lin

l out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer

3-27

R2

l out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2lin

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer

3-28

four sendersmultihop paths timeoutretransmit

Q what happens as lin and lin

rsquo increase

finite shared output link buffers

Host A lout

Causescosts of congestion scenario 3

Host B

Host CHost D

lin original data

lin original data plus retransmitted data

A as red linrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput g 0

Transport Layer

3-29

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

l out

linrsquo

Transport Layer

3-30

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end congestion

controlno explicit

feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer

3-31Case study ATM ABR congestion

controlABR available bit rateldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should use

available bandwidth

if senderrsquos path congested sender throttled to

minimum guaranteed rate

RM (resource management) cellssent by sender interspersed

with data cellsbits in RM cell set by switches

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild

congestion) CI bit congestion indication

RM cells returned to sender by receiver with bits intact

Transport Layer

3-32Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in

returned RM cell

RM cell data cell

Transport Layer

3-33

TCP congestion control

Transport Layer

3-34

TCP congestion control additive increase multiplicative decrease approach sender increases transmission

rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half

after loss

cwnd

TCP

sen

der

cong

estio

n w

indo

w s

ize

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer

3-35TCP Congestion Control details

sender limits transmission

cwnd is dynamic function of perceived network congestion

TCP sending rate

roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte

ACKed sent not-yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent-LastByteAcked

lt cwnd

sender sequence number space

rate~~cwndRTT bytessec

Transport Layer

3-36TCP Slow Start

when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for

every ACK received

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer

3-37

TCP detecting reacting to loss

loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer

3-38

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementationvariable ssthresh on loss event ssthresh is

set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer

3-39

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer

3-40TCP throughput

avg TCP thruput as function of window size RTTignore slow start assume always data to

sendW window size (measured in bytes) where loss

occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW

W2

avg TCP thruput = 34WRTTbytessec

Transport Layer

3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want

10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L

[Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer

3-42

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP Fairness

TCP connection 2

Transport Layer

3-43

Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout

increasesmultiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughputCon

nec t

ion

2 t h

roug

hput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer

3-44

Fairness (more)Fairness and UDP

multimedia apps often do not use TCP do not want rate throttled

by congestion control

instead use UDP send audiovideo at

constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connectionsnew app asks for 1 TCP gets

rate R10new app asks for 11 TCPs

gets R2

  • Congestion in networks
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (2)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • TCP fast retransmit (2)
  • TCP flow control
  • TCP flow control (2)
  • Connection Management
  • Agreeing to establish a connection
  • Agreeing to establish a connection (2)
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • TCP closing a connection (2)
  • Principles of congestion control
  • Principles of congestion control (2)
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 2 (3)
  • Causescosts of congestion scenario 2 (4)
  • Causescosts of congestion scenario 2 (5)
  • Causescosts of congestion scenario 2 (6)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Case study ATM ABR congestion control (2)
  • TCP congestion control
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 10: Congestion in networks

Transport Layer

3-10

TCP flow controlapplicationprocess

TCP socketreceiver buffers

TCPcode

IPcode

applicationOS

receiver protocol stack

application may remove data from

TCP socket buffers hellip

hellip slower than TCP receiver is delivering(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer

3-11

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application processreceiver ldquoadvertisesrdquo free

buffer space by including rwnd value in TCP header of receiver-to-sender segmentsRcvBuffer size set via socket

options (typical default is 4096 bytes)

many operating systems autoadjust RcvBuffer

sender limits amount of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value

guarantees receive buffer will not overflow

receiver-side buffering

Transport Layer

3-12Connection Managementbefore exchanging data senderreceiver

ldquohandshakerdquoagree to establish connection (each knowing the

other willing to establish connection)agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer

3-13

Q will 2-way handshake always work in network

variable delaysretransmitted messages (eg

req_conn(x)) due to message lossmessage reorderingcanrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talkOK ESTAB

ESTAB

choose x req_conn(x)ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer

3-14

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

esserverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose x req_conn(x)ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose x req_conn(x)ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer

3-15TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data received ACK(y)

indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTENserver state

LISTEN

Transport Layer

3-16

TCP 3-way handshake FSM

closed

L

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)SYNACK(seq=yACKnum=x+1)create new socket for communication back to client

SYNACK(seq=yACKnum=x+1)ACK(ACKnum=y+1)ACK(ACKnum=y+1)

L

Transport Layer

3-17TCP closing a connection

client server each close their side of connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN

simultaneous FIN exchanges can be handled

Transport Layer

3-18

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

closecan stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

Transport Layer

3-19Principles of congestion control

Transport Layer

3-20

congestion informally ldquotoo many sources sending too much data too

fast for network to handlerdquodifferent from flow controlmanifestationslost packets (buffer overflow at routers)long delays (queueing in router

buffers)a top-10 problem

Principles of congestion control

Transport Layer

3-21

Causescosts of congestion scenario 1 two senders two receiversone router infinite buffers output link capacity Rno retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data lin

Host B

throughput lout

R2

R2

l out

lin R2de

lay

lin large delays as arrival

rate lin approaches capacity

Transport Layer

3-22

one router finite buffers sender retransmission of timed-out packet application-layer input = application-layer output lin = lout

transport-layer input includes retransmissions lin lin

finite shared output link buffers

Host A

lin original data

Host B

loutlin original data plus retransmitted data

Causescosts of congestion scenario 2

Transport Layer

3-23

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

lin original dataloutlin original data plus

retransmitted data

copy

free buffer space

R2

R2

l out

lin

Causescosts of congestion scenario 2

Host B

A

Transport Layer

3-24

lin original dataloutlin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer

3-25

lin original dataloutlin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2lin

l out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer

3-26

A

lin loutlincopy

free buffer space

timeout

R2

R2lin

l out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer

3-27

R2

l out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2lin

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer

3-28

four sendersmultihop paths timeoutretransmit

Q what happens as lin and lin

rsquo increase

finite shared output link buffers

Host A lout

Causescosts of congestion scenario 3

Host B

Host CHost D

lin original data

lin original data plus retransmitted data

A as red linrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput g 0

Transport Layer

3-29

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

l out

linrsquo

Transport Layer

3-30

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end congestion

controlno explicit

feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer

3-31Case study ATM ABR congestion

controlABR available bit rateldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should use

available bandwidth

if senderrsquos path congested sender throttled to

minimum guaranteed rate

RM (resource management) cellssent by sender interspersed

with data cellsbits in RM cell set by switches

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild

congestion) CI bit congestion indication

RM cells returned to sender by receiver with bits intact

Transport Layer

3-32Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in

returned RM cell

RM cell data cell

Transport Layer

3-33

TCP congestion control

Transport Layer

3-34

TCP congestion control additive increase multiplicative decrease approach sender increases transmission

rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half

after loss

cwnd

TCP

sen

der

cong

estio

n w

indo

w s

ize

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer

3-35TCP Congestion Control details

sender limits transmission

cwnd is dynamic function of perceived network congestion

TCP sending rate

roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte

ACKed sent not-yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent-LastByteAcked

lt cwnd

sender sequence number space

rate~~cwndRTT bytessec

Transport Layer

3-36TCP Slow Start

when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for

every ACK received

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer

3-37

TCP detecting reacting to loss

loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer

3-38

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementationvariable ssthresh on loss event ssthresh is

set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer

3-39

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer

3-40TCP throughput

avg TCP thruput as function of window size RTTignore slow start assume always data to

sendW window size (measured in bytes) where loss

occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW

W2

avg TCP thruput = 34WRTTbytessec

Transport Layer

3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want

10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L

[Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer

3-42

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP Fairness

TCP connection 2

Transport Layer

3-43

Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout

increasesmultiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughputCon

nec t

ion

2 t h

roug

hput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer

3-44

Fairness (more)Fairness and UDP

multimedia apps often do not use TCP do not want rate throttled

by congestion control

instead use UDP send audiovideo at

constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connectionsnew app asks for 1 TCP gets

rate R10new app asks for 11 TCPs

gets R2

  • Congestion in networks
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (2)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • TCP fast retransmit (2)
  • TCP flow control
  • TCP flow control (2)
  • Connection Management
  • Agreeing to establish a connection
  • Agreeing to establish a connection (2)
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • TCP closing a connection (2)
  • Principles of congestion control
  • Principles of congestion control (2)
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 2 (3)
  • Causescosts of congestion scenario 2 (4)
  • Causescosts of congestion scenario 2 (5)
  • Causescosts of congestion scenario 2 (6)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Case study ATM ABR congestion control (2)
  • TCP congestion control
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 11: Congestion in networks

Transport Layer

3-11

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application processreceiver ldquoadvertisesrdquo free

buffer space by including rwnd value in TCP header of receiver-to-sender segmentsRcvBuffer size set via socket

options (typical default is 4096 bytes)

many operating systems autoadjust RcvBuffer

sender limits amount of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value

guarantees receive buffer will not overflow

receiver-side buffering

Transport Layer

3-12Connection Managementbefore exchanging data senderreceiver

ldquohandshakerdquoagree to establish connection (each knowing the

other willing to establish connection)agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer

3-13

Q will 2-way handshake always work in network

variable delaysretransmitted messages (eg

req_conn(x)) due to message lossmessage reorderingcanrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talkOK ESTAB

ESTAB

choose x req_conn(x)ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer

3-14

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

esserverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose x req_conn(x)ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose x req_conn(x)ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer

3-15TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data received ACK(y)

indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTENserver state

LISTEN

Transport Layer

3-16

TCP 3-way handshake FSM

closed

L

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)SYNACK(seq=yACKnum=x+1)create new socket for communication back to client

SYNACK(seq=yACKnum=x+1)ACK(ACKnum=y+1)ACK(ACKnum=y+1)

L

Transport Layer

3-17TCP closing a connection

client server each close their side of connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN

simultaneous FIN exchanges can be handled

Transport Layer

3-18

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

closecan stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

Transport Layer

3-19Principles of congestion control

Transport Layer

3-20

congestion informally ldquotoo many sources sending too much data too

fast for network to handlerdquodifferent from flow controlmanifestationslost packets (buffer overflow at routers)long delays (queueing in router

buffers)a top-10 problem

Principles of congestion control

Transport Layer

3-21

Causescosts of congestion scenario 1 two senders two receiversone router infinite buffers output link capacity Rno retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data lin

Host B

throughput lout

R2

R2

l out

lin R2de

lay

lin large delays as arrival

rate lin approaches capacity

Transport Layer

3-22

one router finite buffers sender retransmission of timed-out packet application-layer input = application-layer output lin = lout

transport-layer input includes retransmissions lin lin

finite shared output link buffers

Host A

lin original data

Host B

loutlin original data plus retransmitted data

Causescosts of congestion scenario 2

Transport Layer

3-23

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

lin original dataloutlin original data plus

retransmitted data

copy

free buffer space

R2

R2

l out

lin

Causescosts of congestion scenario 2

Host B

A

Transport Layer

3-24

lin original dataloutlin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer

3-25

lin original dataloutlin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2lin

l out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer

3-26

A

lin loutlincopy

free buffer space

timeout

R2

R2lin

l out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer

3-27

R2

l out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2lin

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer

3-28

four sendersmultihop paths timeoutretransmit

Q what happens as lin and lin

rsquo increase

finite shared output link buffers

Host A lout

Causescosts of congestion scenario 3

Host B

Host CHost D

lin original data

lin original data plus retransmitted data

A as red linrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput g 0

Transport Layer

3-29

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

l out

linrsquo

Transport Layer

3-30

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end congestion

controlno explicit

feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer

3-31Case study ATM ABR congestion

controlABR available bit rateldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should use

available bandwidth

if senderrsquos path congested sender throttled to

minimum guaranteed rate

RM (resource management) cellssent by sender interspersed

with data cellsbits in RM cell set by switches

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild

congestion) CI bit congestion indication

RM cells returned to sender by receiver with bits intact

Transport Layer

3-32Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in

returned RM cell

RM cell data cell

Transport Layer

3-33

TCP congestion control

Transport Layer

3-34

TCP congestion control additive increase multiplicative decrease approach sender increases transmission

rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half

after loss

cwnd

TCP

sen

der

cong

estio

n w

indo

w s

ize

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer

3-35TCP Congestion Control details

sender limits transmission

cwnd is dynamic function of perceived network congestion

TCP sending rate

roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte

ACKed sent not-yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent-LastByteAcked

lt cwnd

sender sequence number space

rate~~cwndRTT bytessec

Transport Layer

3-36TCP Slow Start

when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for

every ACK received

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer

3-37

TCP detecting reacting to loss

loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer

3-38

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementationvariable ssthresh on loss event ssthresh is

set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer

3-39

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer

3-40TCP throughput

avg TCP thruput as function of window size RTTignore slow start assume always data to

sendW window size (measured in bytes) where loss

occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW

W2

avg TCP thruput = 34WRTTbytessec

Transport Layer

3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want

10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L

[Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer

3-42

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP Fairness

TCP connection 2

Transport Layer

3-43

Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout

increasesmultiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughputCon

nec t

ion

2 t h

roug

hput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer

3-44

Fairness (more)Fairness and UDP

multimedia apps often do not use TCP do not want rate throttled

by congestion control

instead use UDP send audiovideo at

constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connectionsnew app asks for 1 TCP gets

rate R10new app asks for 11 TCPs

gets R2

  • Congestion in networks
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (2)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • TCP fast retransmit (2)
  • TCP flow control
  • TCP flow control (2)
  • Connection Management
  • Agreeing to establish a connection
  • Agreeing to establish a connection (2)
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • TCP closing a connection (2)
  • Principles of congestion control
  • Principles of congestion control (2)
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 2 (3)
  • Causescosts of congestion scenario 2 (4)
  • Causescosts of congestion scenario 2 (5)
  • Causescosts of congestion scenario 2 (6)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Case study ATM ABR congestion control (2)
  • TCP congestion control
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 12: Congestion in networks

Transport Layer

3-12Connection Managementbefore exchanging data senderreceiver

ldquohandshakerdquoagree to establish connection (each knowing the

other willing to establish connection)agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer

3-13

Q will 2-way handshake always work in network

variable delaysretransmitted messages (eg

req_conn(x)) due to message lossmessage reorderingcanrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talkOK ESTAB

ESTAB

choose x req_conn(x)ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer

3-14

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

esserverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose x req_conn(x)ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose x req_conn(x)ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer

3-15TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data received ACK(y)

indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTENserver state

LISTEN

Transport Layer

3-16

TCP 3-way handshake FSM

closed

L

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)SYNACK(seq=yACKnum=x+1)create new socket for communication back to client

SYNACK(seq=yACKnum=x+1)ACK(ACKnum=y+1)ACK(ACKnum=y+1)

L

Transport Layer

3-17TCP closing a connection

client server each close their side of connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN

simultaneous FIN exchanges can be handled

Transport Layer

3-18

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

closecan stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

Transport Layer

3-19Principles of congestion control

Transport Layer

3-20

congestion informally ldquotoo many sources sending too much data too

fast for network to handlerdquodifferent from flow controlmanifestationslost packets (buffer overflow at routers)long delays (queueing in router

buffers)a top-10 problem

Principles of congestion control

Transport Layer

3-21

Causescosts of congestion scenario 1 two senders two receiversone router infinite buffers output link capacity Rno retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data lin

Host B

throughput lout

R2

R2

l out

lin R2de

lay

lin large delays as arrival

rate lin approaches capacity

Transport Layer

3-22

one router finite buffers sender retransmission of timed-out packet application-layer input = application-layer output lin = lout

transport-layer input includes retransmissions lin lin

finite shared output link buffers

Host A

lin original data

Host B

loutlin original data plus retransmitted data

Causescosts of congestion scenario 2

Transport Layer

3-23

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

lin original dataloutlin original data plus

retransmitted data

copy

free buffer space

R2

R2

l out

lin

Causescosts of congestion scenario 2

Host B

A

Transport Layer

3-24

lin original dataloutlin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer

3-25

lin original dataloutlin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2lin

l out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer

3-26

A

lin loutlincopy

free buffer space

timeout

R2

R2lin

l out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer

3-27

R2

l out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2lin

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer

3-28

four sendersmultihop paths timeoutretransmit

Q what happens as lin and lin

rsquo increase

finite shared output link buffers

Host A lout

Causescosts of congestion scenario 3

Host B

Host CHost D

lin original data

lin original data plus retransmitted data

A as red linrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput g 0

Transport Layer

3-29

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

l out

linrsquo

Transport Layer

3-30

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end congestion

controlno explicit

feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer

3-31Case study ATM ABR congestion

controlABR available bit rateldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should use

available bandwidth

if senderrsquos path congested sender throttled to

minimum guaranteed rate

RM (resource management) cellssent by sender interspersed

with data cellsbits in RM cell set by switches

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild

congestion) CI bit congestion indication

RM cells returned to sender by receiver with bits intact

Transport Layer

3-32Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in

returned RM cell

RM cell data cell

Transport Layer

3-33

TCP congestion control

Transport Layer

3-34

TCP congestion control additive increase multiplicative decrease approach sender increases transmission

rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half

after loss

cwnd

TCP

sen

der

cong

estio

n w

indo

w s

ize

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer

3-35TCP Congestion Control details

sender limits transmission

cwnd is dynamic function of perceived network congestion

TCP sending rate

roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte

ACKed sent not-yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent-LastByteAcked

lt cwnd

sender sequence number space

rate~~cwndRTT bytessec

Transport Layer

3-36TCP Slow Start

when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for

every ACK received

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer

3-37

TCP detecting reacting to loss

loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer

3-38

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementationvariable ssthresh on loss event ssthresh is

set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer

3-39

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer

3-40TCP throughput

avg TCP thruput as function of window size RTTignore slow start assume always data to

sendW window size (measured in bytes) where loss

occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW

W2

avg TCP thruput = 34WRTTbytessec

Transport Layer

3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want

10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L

[Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer

3-42

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP Fairness

TCP connection 2

Transport Layer

3-43

Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout

increasesmultiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughputCon

nec t

ion

2 t h

roug

hput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer

3-44

Fairness (more)Fairness and UDP

multimedia apps often do not use TCP do not want rate throttled

by congestion control

instead use UDP send audiovideo at

constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connectionsnew app asks for 1 TCP gets

rate R10new app asks for 11 TCPs

gets R2

  • Congestion in networks
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (2)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • TCP fast retransmit (2)
  • TCP flow control
  • TCP flow control (2)
  • Connection Management
  • Agreeing to establish a connection
  • Agreeing to establish a connection (2)
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • TCP closing a connection (2)
  • Principles of congestion control
  • Principles of congestion control (2)
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 2 (3)
  • Causescosts of congestion scenario 2 (4)
  • Causescosts of congestion scenario 2 (5)
  • Causescosts of congestion scenario 2 (6)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Case study ATM ABR congestion control (2)
  • TCP congestion control
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 13: Congestion in networks

Transport Layer

3-13

Q will 2-way handshake always work in network

variable delaysretransmitted messages (eg

req_conn(x)) due to message lossmessage reorderingcanrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talkOK ESTAB

ESTAB

choose x req_conn(x)ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer

3-14

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

esserverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose x req_conn(x)ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose x req_conn(x)ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer

3-15TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data received ACK(y)

indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTENserver state

LISTEN

Transport Layer

3-16

TCP 3-way handshake FSM

closed

L

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)SYNACK(seq=yACKnum=x+1)create new socket for communication back to client

SYNACK(seq=yACKnum=x+1)ACK(ACKnum=y+1)ACK(ACKnum=y+1)

L

Transport Layer

3-17TCP closing a connection

client server each close their side of connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN

simultaneous FIN exchanges can be handled

Transport Layer

3-18

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

closecan stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

Transport Layer

3-19Principles of congestion control

Transport Layer

3-20

congestion informally ldquotoo many sources sending too much data too

fast for network to handlerdquodifferent from flow controlmanifestationslost packets (buffer overflow at routers)long delays (queueing in router

buffers)a top-10 problem

Principles of congestion control

Transport Layer

3-21

Causescosts of congestion scenario 1 two senders two receiversone router infinite buffers output link capacity Rno retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data lin

Host B

throughput lout

R2

R2

l out

lin R2de

lay

lin large delays as arrival

rate lin approaches capacity

Transport Layer

3-22

one router finite buffers sender retransmission of timed-out packet application-layer input = application-layer output lin = lout

transport-layer input includes retransmissions lin lin

finite shared output link buffers

Host A

lin original data

Host B

loutlin original data plus retransmitted data

Causescosts of congestion scenario 2

Transport Layer

3-23

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

lin original dataloutlin original data plus

retransmitted data

copy

free buffer space

R2

R2

l out

lin

Causescosts of congestion scenario 2

Host B

A

Transport Layer

3-24

lin original dataloutlin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer

3-25

lin original dataloutlin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2lin

l out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer

3-26

A

lin loutlincopy

free buffer space

timeout

R2

R2lin

l out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer

3-27

R2

l out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2lin

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer

3-28

four sendersmultihop paths timeoutretransmit

Q what happens as lin and lin

rsquo increase

finite shared output link buffers

Host A lout

Causescosts of congestion scenario 3

Host B

Host CHost D

lin original data

lin original data plus retransmitted data

A as red linrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput g 0

Transport Layer

3-29

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

l out

linrsquo

Transport Layer

3-30

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end congestion

controlno explicit

feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer

3-31Case study ATM ABR congestion

controlABR available bit rateldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should use

available bandwidth

if senderrsquos path congested sender throttled to

minimum guaranteed rate

RM (resource management) cellssent by sender interspersed

with data cellsbits in RM cell set by switches

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild

congestion) CI bit congestion indication

RM cells returned to sender by receiver with bits intact

Transport Layer

3-32Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in

returned RM cell

RM cell data cell

Transport Layer

3-33

TCP congestion control

Transport Layer

3-34

TCP congestion control additive increase multiplicative decrease approach sender increases transmission

rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half

after loss

cwnd

TCP

sen

der

cong

estio

n w

indo

w s

ize

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer

3-35TCP Congestion Control details

sender limits transmission

cwnd is dynamic function of perceived network congestion

TCP sending rate

roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte

ACKed sent not-yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent-LastByteAcked

lt cwnd

sender sequence number space

rate~~cwndRTT bytessec

Transport Layer

3-36TCP Slow Start

when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for

every ACK received

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer

3-37

TCP detecting reacting to loss

loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer

3-38

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementationvariable ssthresh on loss event ssthresh is

set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer

3-39

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer

3-40TCP throughput

avg TCP thruput as function of window size RTTignore slow start assume always data to

sendW window size (measured in bytes) where loss

occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW

W2

avg TCP thruput = 34WRTTbytessec

Transport Layer

3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want

10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L

[Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer

3-42

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP Fairness

TCP connection 2

Transport Layer

3-43

Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout

increasesmultiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughputCon

nec t

ion

2 t h

roug

hput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer

3-44

Fairness (more)Fairness and UDP

multimedia apps often do not use TCP do not want rate throttled

by congestion control

instead use UDP send audiovideo at

constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connectionsnew app asks for 1 TCP gets

rate R10new app asks for 11 TCPs

gets R2

  • Congestion in networks
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (2)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • TCP fast retransmit (2)
  • TCP flow control
  • TCP flow control (2)
  • Connection Management
  • Agreeing to establish a connection
  • Agreeing to establish a connection (2)
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • TCP closing a connection (2)
  • Principles of congestion control
  • Principles of congestion control (2)
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 2 (3)
  • Causescosts of congestion scenario 2 (4)
  • Causescosts of congestion scenario 2 (5)
  • Causescosts of congestion scenario 2 (6)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Case study ATM ABR congestion control (2)
  • TCP congestion control
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 14: Congestion in networks

Transport Layer

3-14

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

esserverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose x req_conn(x)ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose x req_conn(x)ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer

3-15TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data received ACK(y)

indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTENserver state

LISTEN

Transport Layer

3-16

TCP 3-way handshake FSM

closed

L

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)SYNACK(seq=yACKnum=x+1)create new socket for communication back to client

SYNACK(seq=yACKnum=x+1)ACK(ACKnum=y+1)ACK(ACKnum=y+1)

L

Transport Layer

3-17TCP closing a connection

client server each close their side of connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN

simultaneous FIN exchanges can be handled

Transport Layer

3-18

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

closecan stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

Transport Layer

3-19Principles of congestion control

Transport Layer

3-20

congestion informally ldquotoo many sources sending too much data too

fast for network to handlerdquodifferent from flow controlmanifestationslost packets (buffer overflow at routers)long delays (queueing in router

buffers)a top-10 problem

Principles of congestion control

Transport Layer

3-21

Causescosts of congestion scenario 1 two senders two receiversone router infinite buffers output link capacity Rno retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data lin

Host B

throughput lout

R2

R2

l out

lin R2de

lay

lin large delays as arrival

rate lin approaches capacity

Transport Layer

3-22

one router finite buffers sender retransmission of timed-out packet application-layer input = application-layer output lin = lout

transport-layer input includes retransmissions lin lin

finite shared output link buffers

Host A

lin original data

Host B

loutlin original data plus retransmitted data

Causescosts of congestion scenario 2

Transport Layer

3-23

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

lin original dataloutlin original data plus

retransmitted data

copy

free buffer space

R2

R2

l out

lin

Causescosts of congestion scenario 2

Host B

A

Transport Layer

3-24

lin original dataloutlin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer

3-25

lin original dataloutlin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2lin

l out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer

3-26

A

lin loutlincopy

free buffer space

timeout

R2

R2lin

l out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer

3-27

R2

l out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2lin

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer

3-28

four sendersmultihop paths timeoutretransmit

Q what happens as lin and lin

rsquo increase

finite shared output link buffers

Host A lout

Causescosts of congestion scenario 3

Host B

Host CHost D

lin original data

lin original data plus retransmitted data

A as red linrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput g 0

Transport Layer

3-29

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

l out

linrsquo

Transport Layer

3-30

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end congestion

controlno explicit

feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer

3-31Case study ATM ABR congestion

controlABR available bit rateldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should use

available bandwidth

if senderrsquos path congested sender throttled to

minimum guaranteed rate

RM (resource management) cellssent by sender interspersed

with data cellsbits in RM cell set by switches

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild

congestion) CI bit congestion indication

RM cells returned to sender by receiver with bits intact

Transport Layer

3-32Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in

returned RM cell

RM cell data cell

Transport Layer

3-33

TCP congestion control

Transport Layer

3-34

TCP congestion control additive increase multiplicative decrease approach sender increases transmission

rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half

after loss

cwnd

TCP

sen

der

cong

estio

n w

indo

w s

ize

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer

3-35TCP Congestion Control details

sender limits transmission

cwnd is dynamic function of perceived network congestion

TCP sending rate

roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte

ACKed sent not-yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent-LastByteAcked

lt cwnd

sender sequence number space

rate~~cwndRTT bytessec

Transport Layer

3-36TCP Slow Start

when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for

every ACK received

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer

3-37

TCP detecting reacting to loss

loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer

3-38

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementationvariable ssthresh on loss event ssthresh is

set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer

3-39

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer

3-40TCP throughput

avg TCP thruput as function of window size RTTignore slow start assume always data to

sendW window size (measured in bytes) where loss

occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW

W2

avg TCP thruput = 34WRTTbytessec

Transport Layer

3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want

10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L

[Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer

3-42

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP Fairness

TCP connection 2

Transport Layer

3-43

Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout

increasesmultiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughputCon

nec t

ion

2 t h

roug

hput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer

3-44

Fairness (more)Fairness and UDP

multimedia apps often do not use TCP do not want rate throttled

by congestion control

instead use UDP send audiovideo at

constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connectionsnew app asks for 1 TCP gets

rate R10new app asks for 11 TCPs

gets R2

  • Congestion in networks
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (2)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • TCP fast retransmit (2)
  • TCP flow control
  • TCP flow control (2)
  • Connection Management
  • Agreeing to establish a connection
  • Agreeing to establish a connection (2)
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • TCP closing a connection (2)
  • Principles of congestion control
  • Principles of congestion control (2)
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 2 (3)
  • Causescosts of congestion scenario 2 (4)
  • Causescosts of congestion scenario 2 (5)
  • Causescosts of congestion scenario 2 (6)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Case study ATM ABR congestion control (2)
  • TCP congestion control
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 15: Congestion in networks

Transport Layer

3-15TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data received ACK(y)

indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTENserver state

LISTEN

Transport Layer

3-16

TCP 3-way handshake FSM

closed

L

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)SYNACK(seq=yACKnum=x+1)create new socket for communication back to client

SYNACK(seq=yACKnum=x+1)ACK(ACKnum=y+1)ACK(ACKnum=y+1)

L

Transport Layer

3-17TCP closing a connection

client server each close their side of connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN

simultaneous FIN exchanges can be handled

Transport Layer

3-18

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

closecan stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

Transport Layer

3-19Principles of congestion control

Transport Layer

3-20

congestion informally ldquotoo many sources sending too much data too

fast for network to handlerdquodifferent from flow controlmanifestationslost packets (buffer overflow at routers)long delays (queueing in router

buffers)a top-10 problem

Principles of congestion control

Transport Layer

3-21

Causescosts of congestion scenario 1 two senders two receiversone router infinite buffers output link capacity Rno retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data lin

Host B

throughput lout

R2

R2

l out

lin R2de

lay

lin large delays as arrival

rate lin approaches capacity

Transport Layer

3-22

one router finite buffers sender retransmission of timed-out packet application-layer input = application-layer output lin = lout

transport-layer input includes retransmissions lin lin

finite shared output link buffers

Host A

lin original data

Host B

loutlin original data plus retransmitted data

Causescosts of congestion scenario 2

Transport Layer

3-23

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

lin original dataloutlin original data plus

retransmitted data

copy

free buffer space

R2

R2

l out

lin

Causescosts of congestion scenario 2

Host B

A

Transport Layer

3-24

lin original dataloutlin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer

3-25

lin original dataloutlin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2lin

l out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer

3-26

A

lin loutlincopy

free buffer space

timeout

R2

R2lin

l out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer

3-27

R2

l out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2lin

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer

3-28

four sendersmultihop paths timeoutretransmit

Q what happens as lin and lin

rsquo increase

finite shared output link buffers

Host A lout

Causescosts of congestion scenario 3

Host B

Host CHost D

lin original data

lin original data plus retransmitted data

A as red linrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput g 0

Transport Layer

3-29

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

l out

linrsquo

Transport Layer

3-30

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end congestion

controlno explicit

feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer

3-31Case study ATM ABR congestion

controlABR available bit rateldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should use

available bandwidth

if senderrsquos path congested sender throttled to

minimum guaranteed rate

RM (resource management) cellssent by sender interspersed

with data cellsbits in RM cell set by switches

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild

congestion) CI bit congestion indication

RM cells returned to sender by receiver with bits intact

Transport Layer

3-32Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in

returned RM cell

RM cell data cell

Transport Layer

3-33

TCP congestion control

Transport Layer

3-34

TCP congestion control additive increase multiplicative decrease approach sender increases transmission

rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half

after loss

cwnd

TCP

sen

der

cong

estio

n w

indo

w s

ize

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer

3-35TCP Congestion Control details

sender limits transmission

cwnd is dynamic function of perceived network congestion

TCP sending rate

roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte

ACKed sent not-yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent-LastByteAcked

lt cwnd

sender sequence number space

rate~~cwndRTT bytessec

Transport Layer

3-36TCP Slow Start

when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for

every ACK received

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer

3-37

TCP detecting reacting to loss

loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer

3-38

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementationvariable ssthresh on loss event ssthresh is

set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer

3-39

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer

3-40TCP throughput

avg TCP thruput as function of window size RTTignore slow start assume always data to

sendW window size (measured in bytes) where loss

occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW

W2

avg TCP thruput = 34WRTTbytessec

Transport Layer

3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want

10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L

[Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer

3-42

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP Fairness

TCP connection 2

Transport Layer

3-43

Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout

increasesmultiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughputCon

nec t

ion

2 t h

roug

hput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer

3-44

Fairness (more)Fairness and UDP

multimedia apps often do not use TCP do not want rate throttled

by congestion control

instead use UDP send audiovideo at

constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connectionsnew app asks for 1 TCP gets

rate R10new app asks for 11 TCPs

gets R2

  • Congestion in networks
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (2)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • TCP fast retransmit (2)
  • TCP flow control
  • TCP flow control (2)
  • Connection Management
  • Agreeing to establish a connection
  • Agreeing to establish a connection (2)
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • TCP closing a connection (2)
  • Principles of congestion control
  • Principles of congestion control (2)
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 2 (3)
  • Causescosts of congestion scenario 2 (4)
  • Causescosts of congestion scenario 2 (5)
  • Causescosts of congestion scenario 2 (6)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Case study ATM ABR congestion control (2)
  • TCP congestion control
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 16: Congestion in networks

Transport Layer

3-16

TCP 3-way handshake FSM

closed

L

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)SYNACK(seq=yACKnum=x+1)create new socket for communication back to client

SYNACK(seq=yACKnum=x+1)ACK(ACKnum=y+1)ACK(ACKnum=y+1)

L

Transport Layer

3-17TCP closing a connection

client server each close their side of connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN

simultaneous FIN exchanges can be handled

Transport Layer

3-18

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

closecan stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

Transport Layer

3-19Principles of congestion control

Transport Layer

3-20

congestion informally ldquotoo many sources sending too much data too

fast for network to handlerdquodifferent from flow controlmanifestationslost packets (buffer overflow at routers)long delays (queueing in router

buffers)a top-10 problem

Principles of congestion control

Transport Layer

3-21

Causescosts of congestion scenario 1 two senders two receiversone router infinite buffers output link capacity Rno retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data lin

Host B

throughput lout

R2

R2

l out

lin R2de

lay

lin large delays as arrival

rate lin approaches capacity

Transport Layer

3-22

one router finite buffers sender retransmission of timed-out packet application-layer input = application-layer output lin = lout

transport-layer input includes retransmissions lin lin

finite shared output link buffers

Host A

lin original data

Host B

loutlin original data plus retransmitted data

Causescosts of congestion scenario 2

Transport Layer

3-23

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

lin original dataloutlin original data plus

retransmitted data

copy

free buffer space

R2

R2

l out

lin

Causescosts of congestion scenario 2

Host B

A

Transport Layer

3-24

lin original dataloutlin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer

3-25

lin original dataloutlin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2lin

l out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer

3-26

A

lin loutlincopy

free buffer space

timeout

R2

R2lin

l out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer

3-27

R2

l out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2lin

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer

3-28

four sendersmultihop paths timeoutretransmit

Q what happens as lin and lin

rsquo increase

finite shared output link buffers

Host A lout

Causescosts of congestion scenario 3

Host B

Host CHost D

lin original data

lin original data plus retransmitted data

A as red linrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput g 0

Transport Layer

3-29

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

l out

linrsquo

Transport Layer

3-30

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end congestion

controlno explicit

feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer

3-31Case study ATM ABR congestion

controlABR available bit rateldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should use

available bandwidth

if senderrsquos path congested sender throttled to

minimum guaranteed rate

RM (resource management) cellssent by sender interspersed

with data cellsbits in RM cell set by switches

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild

congestion) CI bit congestion indication

RM cells returned to sender by receiver with bits intact

Transport Layer

3-32Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in

returned RM cell

RM cell data cell

Transport Layer

3-33

TCP congestion control

Transport Layer

3-34

TCP congestion control additive increase multiplicative decrease approach sender increases transmission

rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half

after loss

cwnd

TCP

sen

der

cong

estio

n w

indo

w s

ize

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer

3-35TCP Congestion Control details

sender limits transmission

cwnd is dynamic function of perceived network congestion

TCP sending rate

roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte

ACKed sent not-yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent-LastByteAcked

lt cwnd

sender sequence number space

rate~~cwndRTT bytessec

Transport Layer

3-36TCP Slow Start

when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for

every ACK received

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer

3-37

TCP detecting reacting to loss

loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer

3-38

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementationvariable ssthresh on loss event ssthresh is

set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer

3-39

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer

3-40TCP throughput

avg TCP thruput as function of window size RTTignore slow start assume always data to

sendW window size (measured in bytes) where loss

occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW

W2

avg TCP thruput = 34WRTTbytessec

Transport Layer

3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want

10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L

[Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer

3-42

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP Fairness

TCP connection 2

Transport Layer

3-43

Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout

increasesmultiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughputCon

nec t

ion

2 t h

roug

hput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer

3-44

Fairness (more)Fairness and UDP

multimedia apps often do not use TCP do not want rate throttled

by congestion control

instead use UDP send audiovideo at

constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connectionsnew app asks for 1 TCP gets

rate R10new app asks for 11 TCPs

gets R2

  • Congestion in networks
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (2)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • TCP fast retransmit (2)
  • TCP flow control
  • TCP flow control (2)
  • Connection Management
  • Agreeing to establish a connection
  • Agreeing to establish a connection (2)
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • TCP closing a connection (2)
  • Principles of congestion control
  • Principles of congestion control (2)
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 2 (3)
  • Causescosts of congestion scenario 2 (4)
  • Causescosts of congestion scenario 2 (5)
  • Causescosts of congestion scenario 2 (6)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Case study ATM ABR congestion control (2)
  • TCP congestion control
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 17: Congestion in networks

Transport Layer

3-17TCP closing a connection

client server each close their side of connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN

simultaneous FIN exchanges can be handled

Transport Layer

3-18

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

closecan stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

Transport Layer

3-19Principles of congestion control

Transport Layer

3-20

congestion informally ldquotoo many sources sending too much data too

fast for network to handlerdquodifferent from flow controlmanifestationslost packets (buffer overflow at routers)long delays (queueing in router

buffers)a top-10 problem

Principles of congestion control

Transport Layer

3-21

Causescosts of congestion scenario 1 two senders two receiversone router infinite buffers output link capacity Rno retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data lin

Host B

throughput lout

R2

R2

l out

lin R2de

lay

lin large delays as arrival

rate lin approaches capacity

Transport Layer

3-22

one router finite buffers sender retransmission of timed-out packet application-layer input = application-layer output lin = lout

transport-layer input includes retransmissions lin lin

finite shared output link buffers

Host A

lin original data

Host B

loutlin original data plus retransmitted data

Causescosts of congestion scenario 2

Transport Layer

3-23

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

lin original dataloutlin original data plus

retransmitted data

copy

free buffer space

R2

R2

l out

lin

Causescosts of congestion scenario 2

Host B

A

Transport Layer

3-24

lin original dataloutlin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer

3-25

lin original dataloutlin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2lin

l out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer

3-26

A

lin loutlincopy

free buffer space

timeout

R2

R2lin

l out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer

3-27

R2

l out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2lin

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer

3-28

four sendersmultihop paths timeoutretransmit

Q what happens as lin and lin

rsquo increase

finite shared output link buffers

Host A lout

Causescosts of congestion scenario 3

Host B

Host CHost D

lin original data

lin original data plus retransmitted data

A as red linrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput g 0

Transport Layer

3-29

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

l out

linrsquo

Transport Layer

3-30

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end congestion

controlno explicit

feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer

3-31Case study ATM ABR congestion

controlABR available bit rateldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should use

available bandwidth

if senderrsquos path congested sender throttled to

minimum guaranteed rate

RM (resource management) cellssent by sender interspersed

with data cellsbits in RM cell set by switches

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild

congestion) CI bit congestion indication

RM cells returned to sender by receiver with bits intact

Transport Layer

3-32Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in

returned RM cell

RM cell data cell

Transport Layer

3-33

TCP congestion control

Transport Layer

3-34

TCP congestion control additive increase multiplicative decrease approach sender increases transmission

rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half

after loss

cwnd

TCP

sen

der

cong

estio

n w

indo

w s

ize

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer

3-35TCP Congestion Control details

sender limits transmission

cwnd is dynamic function of perceived network congestion

TCP sending rate

roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte

ACKed sent not-yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent-LastByteAcked

lt cwnd

sender sequence number space

rate~~cwndRTT bytessec

Transport Layer

3-36TCP Slow Start

when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for

every ACK received

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer

3-37

TCP detecting reacting to loss

loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer

3-38

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementationvariable ssthresh on loss event ssthresh is

set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer

3-39

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer

3-40TCP throughput

avg TCP thruput as function of window size RTTignore slow start assume always data to

sendW window size (measured in bytes) where loss

occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW

W2

avg TCP thruput = 34WRTTbytessec

Transport Layer

3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want

10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L

[Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer

3-42

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP Fairness

TCP connection 2

Transport Layer

3-43

Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout

increasesmultiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughputCon

nec t

ion

2 t h

roug

hput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer

3-44

Fairness (more)Fairness and UDP

multimedia apps often do not use TCP do not want rate throttled

by congestion control

instead use UDP send audiovideo at

constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connectionsnew app asks for 1 TCP gets

rate R10new app asks for 11 TCPs

gets R2

  • Congestion in networks
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (2)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • TCP fast retransmit (2)
  • TCP flow control
  • TCP flow control (2)
  • Connection Management
  • Agreeing to establish a connection
  • Agreeing to establish a connection (2)
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • TCP closing a connection (2)
  • Principles of congestion control
  • Principles of congestion control (2)
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 2 (3)
  • Causescosts of congestion scenario 2 (4)
  • Causescosts of congestion scenario 2 (5)
  • Causescosts of congestion scenario 2 (6)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Case study ATM ABR congestion control (2)
  • TCP congestion control
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 18: Congestion in networks

Transport Layer

3-18

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

closecan stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

Transport Layer

3-19Principles of congestion control

Transport Layer

3-20

congestion informally ldquotoo many sources sending too much data too

fast for network to handlerdquodifferent from flow controlmanifestationslost packets (buffer overflow at routers)long delays (queueing in router

buffers)a top-10 problem

Principles of congestion control

Transport Layer

3-21

Causescosts of congestion scenario 1 two senders two receiversone router infinite buffers output link capacity Rno retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data lin

Host B

throughput lout

R2

R2

l out

lin R2de

lay

lin large delays as arrival

rate lin approaches capacity

Transport Layer

3-22

one router finite buffers sender retransmission of timed-out packet application-layer input = application-layer output lin = lout

transport-layer input includes retransmissions lin lin

finite shared output link buffers

Host A

lin original data

Host B

loutlin original data plus retransmitted data

Causescosts of congestion scenario 2

Transport Layer

3-23

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

lin original dataloutlin original data plus

retransmitted data

copy

free buffer space

R2

R2

l out

lin

Causescosts of congestion scenario 2

Host B

A

Transport Layer

3-24

lin original dataloutlin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer

3-25

lin original dataloutlin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2lin

l out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer

3-26

A

lin loutlincopy

free buffer space

timeout

R2

R2lin

l out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer

3-27

R2

l out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2lin

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer

3-28

four sendersmultihop paths timeoutretransmit

Q what happens as lin and lin

rsquo increase

finite shared output link buffers

Host A lout

Causescosts of congestion scenario 3

Host B

Host CHost D

lin original data

lin original data plus retransmitted data

A as red linrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput g 0

Transport Layer

3-29

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

l out

linrsquo

Transport Layer

3-30

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end congestion

controlno explicit

feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer

3-31Case study ATM ABR congestion

controlABR available bit rateldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should use

available bandwidth

if senderrsquos path congested sender throttled to

minimum guaranteed rate

RM (resource management) cellssent by sender interspersed

with data cellsbits in RM cell set by switches

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild

congestion) CI bit congestion indication

RM cells returned to sender by receiver with bits intact

Transport Layer

3-32Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in

returned RM cell

RM cell data cell

Transport Layer

3-33

TCP congestion control

Transport Layer

3-34

TCP congestion control additive increase multiplicative decrease approach sender increases transmission

rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half

after loss

cwnd

TCP

sen

der

cong

estio

n w

indo

w s

ize

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer

3-35TCP Congestion Control details

sender limits transmission

cwnd is dynamic function of perceived network congestion

TCP sending rate

roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte

ACKed sent not-yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent-LastByteAcked

lt cwnd

sender sequence number space

rate~~cwndRTT bytessec

Transport Layer

3-36TCP Slow Start

when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for

every ACK received

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer

3-37

TCP detecting reacting to loss

loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer

3-38

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementationvariable ssthresh on loss event ssthresh is

set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer

3-39

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer

3-40TCP throughput

avg TCP thruput as function of window size RTTignore slow start assume always data to

sendW window size (measured in bytes) where loss

occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW

W2

avg TCP thruput = 34WRTTbytessec

Transport Layer

3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want

10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L

[Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer

3-42

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP Fairness

TCP connection 2

Transport Layer

3-43

Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout

increasesmultiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughputCon

nec t

ion

2 t h

roug

hput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer

3-44

Fairness (more)Fairness and UDP

multimedia apps often do not use TCP do not want rate throttled

by congestion control

instead use UDP send audiovideo at

constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connectionsnew app asks for 1 TCP gets

rate R10new app asks for 11 TCPs

gets R2

  • Congestion in networks
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (2)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • TCP fast retransmit (2)
  • TCP flow control
  • TCP flow control (2)
  • Connection Management
  • Agreeing to establish a connection
  • Agreeing to establish a connection (2)
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • TCP closing a connection (2)
  • Principles of congestion control
  • Principles of congestion control (2)
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 2 (3)
  • Causescosts of congestion scenario 2 (4)
  • Causescosts of congestion scenario 2 (5)
  • Causescosts of congestion scenario 2 (6)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Case study ATM ABR congestion control (2)
  • TCP congestion control
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 19: Congestion in networks

Transport Layer

3-19Principles of congestion control

Transport Layer

3-20

congestion informally ldquotoo many sources sending too much data too

fast for network to handlerdquodifferent from flow controlmanifestationslost packets (buffer overflow at routers)long delays (queueing in router

buffers)a top-10 problem

Principles of congestion control

Transport Layer

3-21

Causescosts of congestion scenario 1 two senders two receiversone router infinite buffers output link capacity Rno retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data lin

Host B

throughput lout

R2

R2

l out

lin R2de

lay

lin large delays as arrival

rate lin approaches capacity

Transport Layer

3-22

one router finite buffers sender retransmission of timed-out packet application-layer input = application-layer output lin = lout

transport-layer input includes retransmissions lin lin

finite shared output link buffers

Host A

lin original data

Host B

loutlin original data plus retransmitted data

Causescosts of congestion scenario 2

Transport Layer

3-23

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

lin original dataloutlin original data plus

retransmitted data

copy

free buffer space

R2

R2

l out

lin

Causescosts of congestion scenario 2

Host B

A

Transport Layer

3-24

lin original dataloutlin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer

3-25

lin original dataloutlin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2lin

l out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer

3-26

A

lin loutlincopy

free buffer space

timeout

R2

R2lin

l out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer

3-27

R2

l out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2lin

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer

3-28

four sendersmultihop paths timeoutretransmit

Q what happens as lin and lin

rsquo increase

finite shared output link buffers

Host A lout

Causescosts of congestion scenario 3

Host B

Host CHost D

lin original data

lin original data plus retransmitted data

A as red linrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput g 0

Transport Layer

3-29

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

l out

linrsquo

Transport Layer

3-30

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end congestion

controlno explicit

feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer

3-31Case study ATM ABR congestion

controlABR available bit rateldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should use

available bandwidth

if senderrsquos path congested sender throttled to

minimum guaranteed rate

RM (resource management) cellssent by sender interspersed

with data cellsbits in RM cell set by switches

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild

congestion) CI bit congestion indication

RM cells returned to sender by receiver with bits intact

Transport Layer

3-32Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in

returned RM cell

RM cell data cell

Transport Layer

3-33

TCP congestion control

Transport Layer

3-34

TCP congestion control additive increase multiplicative decrease approach sender increases transmission

rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half

after loss

cwnd

TCP

sen

der

cong

estio

n w

indo

w s

ize

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer

3-35TCP Congestion Control details

sender limits transmission

cwnd is dynamic function of perceived network congestion

TCP sending rate

roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte

ACKed sent not-yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent-LastByteAcked

lt cwnd

sender sequence number space

rate~~cwndRTT bytessec

Transport Layer

3-36TCP Slow Start

when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for

every ACK received

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer

3-37

TCP detecting reacting to loss

loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer

3-38

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementationvariable ssthresh on loss event ssthresh is

set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer

3-39

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer

3-40TCP throughput

avg TCP thruput as function of window size RTTignore slow start assume always data to

sendW window size (measured in bytes) where loss

occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW

W2

avg TCP thruput = 34WRTTbytessec

Transport Layer

3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want

10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L

[Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer

3-42

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP Fairness

TCP connection 2

Transport Layer

3-43

Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout

increasesmultiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughputCon

nec t

ion

2 t h

roug

hput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer

3-44

Fairness (more)Fairness and UDP

multimedia apps often do not use TCP do not want rate throttled

by congestion control

instead use UDP send audiovideo at

constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connectionsnew app asks for 1 TCP gets

rate R10new app asks for 11 TCPs

gets R2

  • Congestion in networks
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (2)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • TCP fast retransmit (2)
  • TCP flow control
  • TCP flow control (2)
  • Connection Management
  • Agreeing to establish a connection
  • Agreeing to establish a connection (2)
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • TCP closing a connection (2)
  • Principles of congestion control
  • Principles of congestion control (2)
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 2 (3)
  • Causescosts of congestion scenario 2 (4)
  • Causescosts of congestion scenario 2 (5)
  • Causescosts of congestion scenario 2 (6)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Case study ATM ABR congestion control (2)
  • TCP congestion control
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 20: Congestion in networks

Transport Layer

3-20

congestion informally ldquotoo many sources sending too much data too

fast for network to handlerdquodifferent from flow controlmanifestationslost packets (buffer overflow at routers)long delays (queueing in router

buffers)a top-10 problem

Principles of congestion control

Transport Layer

3-21

Causescosts of congestion scenario 1 two senders two receiversone router infinite buffers output link capacity Rno retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data lin

Host B

throughput lout

R2

R2

l out

lin R2de

lay

lin large delays as arrival

rate lin approaches capacity

Transport Layer

3-22

one router finite buffers sender retransmission of timed-out packet application-layer input = application-layer output lin = lout

transport-layer input includes retransmissions lin lin

finite shared output link buffers

Host A

lin original data

Host B

loutlin original data plus retransmitted data

Causescosts of congestion scenario 2

Transport Layer

3-23

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

lin original dataloutlin original data plus

retransmitted data

copy

free buffer space

R2

R2

l out

lin

Causescosts of congestion scenario 2

Host B

A

Transport Layer

3-24

lin original dataloutlin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer

3-25

lin original dataloutlin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2lin

l out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer

3-26

A

lin loutlincopy

free buffer space

timeout

R2

R2lin

l out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer

3-27

R2

l out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2lin

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer

3-28

four sendersmultihop paths timeoutretransmit

Q what happens as lin and lin

rsquo increase

finite shared output link buffers

Host A lout

Causescosts of congestion scenario 3

Host B

Host CHost D

lin original data

lin original data plus retransmitted data

A as red linrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput g 0

Transport Layer

3-29

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

l out

linrsquo

Transport Layer

3-30

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end congestion

controlno explicit

feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer

3-31Case study ATM ABR congestion

controlABR available bit rateldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should use

available bandwidth

if senderrsquos path congested sender throttled to

minimum guaranteed rate

RM (resource management) cellssent by sender interspersed

with data cellsbits in RM cell set by switches

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild

congestion) CI bit congestion indication

RM cells returned to sender by receiver with bits intact

Transport Layer

3-32Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in

returned RM cell

RM cell data cell

Transport Layer

3-33

TCP congestion control

Transport Layer

3-34

TCP congestion control additive increase multiplicative decrease approach sender increases transmission

rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half

after loss

cwnd

TCP

sen

der

cong

estio

n w

indo

w s

ize

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer

3-35TCP Congestion Control details

sender limits transmission

cwnd is dynamic function of perceived network congestion

TCP sending rate

roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte

ACKed sent not-yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent-LastByteAcked

lt cwnd

sender sequence number space

rate~~cwndRTT bytessec

Transport Layer

3-36TCP Slow Start

when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for

every ACK received

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer

3-37

TCP detecting reacting to loss

loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer

3-38

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementationvariable ssthresh on loss event ssthresh is

set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer

3-39

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer

3-40TCP throughput

avg TCP thruput as function of window size RTTignore slow start assume always data to

sendW window size (measured in bytes) where loss

occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW

W2

avg TCP thruput = 34WRTTbytessec

Transport Layer

3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want

10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L

[Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer

3-42

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP Fairness

TCP connection 2

Transport Layer

3-43

Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout

increasesmultiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughputCon

nec t

ion

2 t h

roug

hput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer

3-44

Fairness (more)Fairness and UDP

multimedia apps often do not use TCP do not want rate throttled

by congestion control

instead use UDP send audiovideo at

constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connectionsnew app asks for 1 TCP gets

rate R10new app asks for 11 TCPs

gets R2

  • Congestion in networks
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (2)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • TCP fast retransmit (2)
  • TCP flow control
  • TCP flow control (2)
  • Connection Management
  • Agreeing to establish a connection
  • Agreeing to establish a connection (2)
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • TCP closing a connection (2)
  • Principles of congestion control
  • Principles of congestion control (2)
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 2 (3)
  • Causescosts of congestion scenario 2 (4)
  • Causescosts of congestion scenario 2 (5)
  • Causescosts of congestion scenario 2 (6)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Case study ATM ABR congestion control (2)
  • TCP congestion control
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 21: Congestion in networks

Transport Layer

3-21

Causescosts of congestion scenario 1 two senders two receiversone router infinite buffers output link capacity Rno retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data lin

Host B

throughput lout

R2

R2

l out

lin R2de

lay

lin large delays as arrival

rate lin approaches capacity

Transport Layer

3-22

one router finite buffers sender retransmission of timed-out packet application-layer input = application-layer output lin = lout

transport-layer input includes retransmissions lin lin

finite shared output link buffers

Host A

lin original data

Host B

loutlin original data plus retransmitted data

Causescosts of congestion scenario 2

Transport Layer

3-23

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

lin original dataloutlin original data plus

retransmitted data

copy

free buffer space

R2

R2

l out

lin

Causescosts of congestion scenario 2

Host B

A

Transport Layer

3-24

lin original dataloutlin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer

3-25

lin original dataloutlin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2lin

l out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer

3-26

A

lin loutlincopy

free buffer space

timeout

R2

R2lin

l out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer

3-27

R2

l out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2lin

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer

3-28

four sendersmultihop paths timeoutretransmit

Q what happens as lin and lin

rsquo increase

finite shared output link buffers

Host A lout

Causescosts of congestion scenario 3

Host B

Host CHost D

lin original data

lin original data plus retransmitted data

A as red linrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput g 0

Transport Layer

3-29

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

l out

linrsquo

Transport Layer

3-30

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end congestion

controlno explicit

feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer

3-31Case study ATM ABR congestion

controlABR available bit rateldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should use

available bandwidth

if senderrsquos path congested sender throttled to

minimum guaranteed rate

RM (resource management) cellssent by sender interspersed

with data cellsbits in RM cell set by switches

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild

congestion) CI bit congestion indication

RM cells returned to sender by receiver with bits intact

Transport Layer

3-32Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in

returned RM cell

RM cell data cell

Transport Layer

3-33

TCP congestion control

Transport Layer

3-34

TCP congestion control additive increase multiplicative decrease approach sender increases transmission

rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half

after loss

cwnd

TCP

sen

der

cong

estio

n w

indo

w s

ize

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer

3-35TCP Congestion Control details

sender limits transmission

cwnd is dynamic function of perceived network congestion

TCP sending rate

roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte

ACKed sent not-yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent-LastByteAcked

lt cwnd

sender sequence number space

rate~~cwndRTT bytessec

Transport Layer

3-36TCP Slow Start

when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for

every ACK received

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer

3-37

TCP detecting reacting to loss

loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer

3-38

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementationvariable ssthresh on loss event ssthresh is

set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer

3-39

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer

3-40TCP throughput

avg TCP thruput as function of window size RTTignore slow start assume always data to

sendW window size (measured in bytes) where loss

occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW

W2

avg TCP thruput = 34WRTTbytessec

Transport Layer

3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want

10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L

[Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer

3-42

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP Fairness

TCP connection 2

Transport Layer

3-43

Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout

increasesmultiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughputCon

nec t

ion

2 t h

roug

hput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer

3-44

Fairness (more)Fairness and UDP

multimedia apps often do not use TCP do not want rate throttled

by congestion control

instead use UDP send audiovideo at

constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connectionsnew app asks for 1 TCP gets

rate R10new app asks for 11 TCPs

gets R2

  • Congestion in networks
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (2)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • TCP fast retransmit (2)
  • TCP flow control
  • TCP flow control (2)
  • Connection Management
  • Agreeing to establish a connection
  • Agreeing to establish a connection (2)
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • TCP closing a connection (2)
  • Principles of congestion control
  • Principles of congestion control (2)
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 2 (3)
  • Causescosts of congestion scenario 2 (4)
  • Causescosts of congestion scenario 2 (5)
  • Causescosts of congestion scenario 2 (6)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Case study ATM ABR congestion control (2)
  • TCP congestion control
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 22: Congestion in networks

Transport Layer

3-22

one router finite buffers sender retransmission of timed-out packet application-layer input = application-layer output lin = lout

transport-layer input includes retransmissions lin lin

finite shared output link buffers

Host A

lin original data

Host B

loutlin original data plus retransmitted data

Causescosts of congestion scenario 2

Transport Layer

3-23

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

lin original dataloutlin original data plus

retransmitted data

copy

free buffer space

R2

R2

l out

lin

Causescosts of congestion scenario 2

Host B

A

Transport Layer

3-24

lin original dataloutlin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer

3-25

lin original dataloutlin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2lin

l out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer

3-26

A

lin loutlincopy

free buffer space

timeout

R2

R2lin

l out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer

3-27

R2

l out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2lin

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer

3-28

four sendersmultihop paths timeoutretransmit

Q what happens as lin and lin

rsquo increase

finite shared output link buffers

Host A lout

Causescosts of congestion scenario 3

Host B

Host CHost D

lin original data

lin original data plus retransmitted data

A as red linrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput g 0

Transport Layer

3-29

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

l out

linrsquo

Transport Layer

3-30

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end congestion

controlno explicit

feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer

3-31Case study ATM ABR congestion

controlABR available bit rateldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should use

available bandwidth

if senderrsquos path congested sender throttled to

minimum guaranteed rate

RM (resource management) cellssent by sender interspersed

with data cellsbits in RM cell set by switches

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild

congestion) CI bit congestion indication

RM cells returned to sender by receiver with bits intact

Transport Layer

3-32Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in

returned RM cell

RM cell data cell

Transport Layer

3-33

TCP congestion control

Transport Layer

3-34

TCP congestion control additive increase multiplicative decrease approach sender increases transmission

rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half

after loss

cwnd

TCP

sen

der

cong

estio

n w

indo

w s

ize

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer

3-35TCP Congestion Control details

sender limits transmission

cwnd is dynamic function of perceived network congestion

TCP sending rate

roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte

ACKed sent not-yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent-LastByteAcked

lt cwnd

sender sequence number space

rate~~cwndRTT bytessec

Transport Layer

3-36TCP Slow Start

when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for

every ACK received

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer

3-37

TCP detecting reacting to loss

loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer

3-38

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementationvariable ssthresh on loss event ssthresh is

set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer

3-39

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer

3-40TCP throughput

avg TCP thruput as function of window size RTTignore slow start assume always data to

sendW window size (measured in bytes) where loss

occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW

W2

avg TCP thruput = 34WRTTbytessec

Transport Layer

3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want

10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L

[Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer

3-42

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP Fairness

TCP connection 2

Transport Layer

3-43

Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout

increasesmultiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughputCon

nec t

ion

2 t h

roug

hput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer

3-44

Fairness (more)Fairness and UDP

multimedia apps often do not use TCP do not want rate throttled

by congestion control

instead use UDP send audiovideo at

constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connectionsnew app asks for 1 TCP gets

rate R10new app asks for 11 TCPs

gets R2

  • Congestion in networks
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (2)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • TCP fast retransmit (2)
  • TCP flow control
  • TCP flow control (2)
  • Connection Management
  • Agreeing to establish a connection
  • Agreeing to establish a connection (2)
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • TCP closing a connection (2)
  • Principles of congestion control
  • Principles of congestion control (2)
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 2 (3)
  • Causescosts of congestion scenario 2 (4)
  • Causescosts of congestion scenario 2 (5)
  • Causescosts of congestion scenario 2 (6)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Case study ATM ABR congestion control (2)
  • TCP congestion control
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 23: Congestion in networks

Transport Layer

3-23

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

lin original dataloutlin original data plus

retransmitted data

copy

free buffer space

R2

R2

l out

lin

Causescosts of congestion scenario 2

Host B

A

Transport Layer

3-24

lin original dataloutlin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer

3-25

lin original dataloutlin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2lin

l out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer

3-26

A

lin loutlincopy

free buffer space

timeout

R2

R2lin

l out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer

3-27

R2

l out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2lin

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer

3-28

four sendersmultihop paths timeoutretransmit

Q what happens as lin and lin

rsquo increase

finite shared output link buffers

Host A lout

Causescosts of congestion scenario 3

Host B

Host CHost D

lin original data

lin original data plus retransmitted data

A as red linrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput g 0

Transport Layer

3-29

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

l out

linrsquo

Transport Layer

3-30

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end congestion

controlno explicit

feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer

3-31Case study ATM ABR congestion

controlABR available bit rateldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should use

available bandwidth

if senderrsquos path congested sender throttled to

minimum guaranteed rate

RM (resource management) cellssent by sender interspersed

with data cellsbits in RM cell set by switches

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild

congestion) CI bit congestion indication

RM cells returned to sender by receiver with bits intact

Transport Layer

3-32Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in

returned RM cell

RM cell data cell

Transport Layer

3-33

TCP congestion control

Transport Layer

3-34

TCP congestion control additive increase multiplicative decrease approach sender increases transmission

rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half

after loss

cwnd

TCP

sen

der

cong

estio

n w

indo

w s

ize

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer

3-35TCP Congestion Control details

sender limits transmission

cwnd is dynamic function of perceived network congestion

TCP sending rate

roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte

ACKed sent not-yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent-LastByteAcked

lt cwnd

sender sequence number space

rate~~cwndRTT bytessec

Transport Layer

3-36TCP Slow Start

when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for

every ACK received

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer

3-37

TCP detecting reacting to loss

loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer

3-38

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementationvariable ssthresh on loss event ssthresh is

set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer

3-39

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer

3-40TCP throughput

avg TCP thruput as function of window size RTTignore slow start assume always data to

sendW window size (measured in bytes) where loss

occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW

W2

avg TCP thruput = 34WRTTbytessec

Transport Layer

3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want

10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L

[Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer

3-42

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP Fairness

TCP connection 2

Transport Layer

3-43

Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout

increasesmultiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughputCon

nec t

ion

2 t h

roug

hput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer

3-44

Fairness (more)Fairness and UDP

multimedia apps often do not use TCP do not want rate throttled

by congestion control

instead use UDP send audiovideo at

constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connectionsnew app asks for 1 TCP gets

rate R10new app asks for 11 TCPs

gets R2

  • Congestion in networks
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (2)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • TCP fast retransmit (2)
  • TCP flow control
  • TCP flow control (2)
  • Connection Management
  • Agreeing to establish a connection
  • Agreeing to establish a connection (2)
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • TCP closing a connection (2)
  • Principles of congestion control
  • Principles of congestion control (2)
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 2 (3)
  • Causescosts of congestion scenario 2 (4)
  • Causescosts of congestion scenario 2 (5)
  • Causescosts of congestion scenario 2 (6)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Case study ATM ABR congestion control (2)
  • TCP congestion control
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 24: Congestion in networks

Transport Layer

3-24

lin original dataloutlin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer

3-25

lin original dataloutlin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2lin

l out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer

3-26

A

lin loutlincopy

free buffer space

timeout

R2

R2lin

l out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer

3-27

R2

l out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2lin

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer

3-28

four sendersmultihop paths timeoutretransmit

Q what happens as lin and lin

rsquo increase

finite shared output link buffers

Host A lout

Causescosts of congestion scenario 3

Host B

Host CHost D

lin original data

lin original data plus retransmitted data

A as red linrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput g 0

Transport Layer

3-29

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

l out

linrsquo

Transport Layer

3-30

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end congestion

controlno explicit

feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer

3-31Case study ATM ABR congestion

controlABR available bit rateldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should use

available bandwidth

if senderrsquos path congested sender throttled to

minimum guaranteed rate

RM (resource management) cellssent by sender interspersed

with data cellsbits in RM cell set by switches

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild

congestion) CI bit congestion indication

RM cells returned to sender by receiver with bits intact

Transport Layer

3-32Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in

returned RM cell

RM cell data cell

Transport Layer

3-33

TCP congestion control

Transport Layer

3-34

TCP congestion control additive increase multiplicative decrease approach sender increases transmission

rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half

after loss

cwnd

TCP

sen

der

cong

estio

n w

indo

w s

ize

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer

3-35TCP Congestion Control details

sender limits transmission

cwnd is dynamic function of perceived network congestion

TCP sending rate

roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte

ACKed sent not-yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent-LastByteAcked

lt cwnd

sender sequence number space

rate~~cwndRTT bytessec

Transport Layer

3-36TCP Slow Start

when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for

every ACK received

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer

3-37

TCP detecting reacting to loss

loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer

3-38

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementationvariable ssthresh on loss event ssthresh is

set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer

3-39

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer

3-40TCP throughput

avg TCP thruput as function of window size RTTignore slow start assume always data to

sendW window size (measured in bytes) where loss

occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW

W2

avg TCP thruput = 34WRTTbytessec

Transport Layer

3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want

10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L

[Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer

3-42

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP Fairness

TCP connection 2

Transport Layer

3-43

Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout

increasesmultiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughputCon

nec t

ion

2 t h

roug

hput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer

3-44

Fairness (more)Fairness and UDP

multimedia apps often do not use TCP do not want rate throttled

by congestion control

instead use UDP send audiovideo at

constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connectionsnew app asks for 1 TCP gets

rate R10new app asks for 11 TCPs

gets R2

  • Congestion in networks
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (2)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • TCP fast retransmit (2)
  • TCP flow control
  • TCP flow control (2)
  • Connection Management
  • Agreeing to establish a connection
  • Agreeing to establish a connection (2)
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • TCP closing a connection (2)
  • Principles of congestion control
  • Principles of congestion control (2)
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 2 (3)
  • Causescosts of congestion scenario 2 (4)
  • Causescosts of congestion scenario 2 (5)
  • Causescosts of congestion scenario 2 (6)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Case study ATM ABR congestion control (2)
  • TCP congestion control
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 25: Congestion in networks

Transport Layer

3-25

lin original dataloutlin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2lin

l out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer

3-26

A

lin loutlincopy

free buffer space

timeout

R2

R2lin

l out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer

3-27

R2

l out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2lin

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer

3-28

four sendersmultihop paths timeoutretransmit

Q what happens as lin and lin

rsquo increase

finite shared output link buffers

Host A lout

Causescosts of congestion scenario 3

Host B

Host CHost D

lin original data

lin original data plus retransmitted data

A as red linrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput g 0

Transport Layer

3-29

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

l out

linrsquo

Transport Layer

3-30

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end congestion

controlno explicit

feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer

3-31Case study ATM ABR congestion

controlABR available bit rateldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should use

available bandwidth

if senderrsquos path congested sender throttled to

minimum guaranteed rate

RM (resource management) cellssent by sender interspersed

with data cellsbits in RM cell set by switches

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild

congestion) CI bit congestion indication

RM cells returned to sender by receiver with bits intact

Transport Layer

3-32Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in

returned RM cell

RM cell data cell

Transport Layer

3-33

TCP congestion control

Transport Layer

3-34

TCP congestion control additive increase multiplicative decrease approach sender increases transmission

rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half

after loss

cwnd

TCP

sen

der

cong

estio

n w

indo

w s

ize

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer

3-35TCP Congestion Control details

sender limits transmission

cwnd is dynamic function of perceived network congestion

TCP sending rate

roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte

ACKed sent not-yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent-LastByteAcked

lt cwnd

sender sequence number space

rate~~cwndRTT bytessec

Transport Layer

3-36TCP Slow Start

when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for

every ACK received

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer

3-37

TCP detecting reacting to loss

loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer

3-38

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementationvariable ssthresh on loss event ssthresh is

set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer

3-39

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer

3-40TCP throughput

avg TCP thruput as function of window size RTTignore slow start assume always data to

sendW window size (measured in bytes) where loss

occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW

W2

avg TCP thruput = 34WRTTbytessec

Transport Layer

3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want

10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L

[Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer

3-42

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP Fairness

TCP connection 2

Transport Layer

3-43

Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout

increasesmultiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughputCon

nec t

ion

2 t h

roug

hput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer

3-44

Fairness (more)Fairness and UDP

multimedia apps often do not use TCP do not want rate throttled

by congestion control

instead use UDP send audiovideo at

constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connectionsnew app asks for 1 TCP gets

rate R10new app asks for 11 TCPs

gets R2

  • Congestion in networks
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (2)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • TCP fast retransmit (2)
  • TCP flow control
  • TCP flow control (2)
  • Connection Management
  • Agreeing to establish a connection
  • Agreeing to establish a connection (2)
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • TCP closing a connection (2)
  • Principles of congestion control
  • Principles of congestion control (2)
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 2 (3)
  • Causescosts of congestion scenario 2 (4)
  • Causescosts of congestion scenario 2 (5)
  • Causescosts of congestion scenario 2 (6)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Case study ATM ABR congestion control (2)
  • TCP congestion control
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 26: Congestion in networks

Transport Layer

3-26

A

lin loutlincopy

free buffer space

timeout

R2

R2lin

l out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer

3-27

R2

l out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2lin

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer

3-28

four sendersmultihop paths timeoutretransmit

Q what happens as lin and lin

rsquo increase

finite shared output link buffers

Host A lout

Causescosts of congestion scenario 3

Host B

Host CHost D

lin original data

lin original data plus retransmitted data

A as red linrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput g 0

Transport Layer

3-29

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

l out

linrsquo

Transport Layer

3-30

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end congestion

controlno explicit

feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer

3-31Case study ATM ABR congestion

controlABR available bit rateldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should use

available bandwidth

if senderrsquos path congested sender throttled to

minimum guaranteed rate

RM (resource management) cellssent by sender interspersed

with data cellsbits in RM cell set by switches

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild

congestion) CI bit congestion indication

RM cells returned to sender by receiver with bits intact

Transport Layer

3-32Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in

returned RM cell

RM cell data cell

Transport Layer

3-33

TCP congestion control

Transport Layer

3-34

TCP congestion control additive increase multiplicative decrease approach sender increases transmission

rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half

after loss

cwnd

TCP

sen

der

cong

estio

n w

indo

w s

ize

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer

3-35TCP Congestion Control details

sender limits transmission

cwnd is dynamic function of perceived network congestion

TCP sending rate

roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte

ACKed sent not-yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent-LastByteAcked

lt cwnd

sender sequence number space

rate~~cwndRTT bytessec

Transport Layer

3-36TCP Slow Start

when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for

every ACK received

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer

3-37

TCP detecting reacting to loss

loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer

3-38

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementationvariable ssthresh on loss event ssthresh is

set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer

3-39

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer

3-40TCP throughput

avg TCP thruput as function of window size RTTignore slow start assume always data to

sendW window size (measured in bytes) where loss

occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW

W2

avg TCP thruput = 34WRTTbytessec

Transport Layer

3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want

10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L

[Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer

3-42

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP Fairness

TCP connection 2

Transport Layer

3-43

Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout

increasesmultiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughputCon

nec t

ion

2 t h

roug

hput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer

3-44

Fairness (more)Fairness and UDP

multimedia apps often do not use TCP do not want rate throttled

by congestion control

instead use UDP send audiovideo at

constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connectionsnew app asks for 1 TCP gets

rate R10new app asks for 11 TCPs

gets R2

  • Congestion in networks
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (2)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • TCP fast retransmit (2)
  • TCP flow control
  • TCP flow control (2)
  • Connection Management
  • Agreeing to establish a connection
  • Agreeing to establish a connection (2)
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • TCP closing a connection (2)
  • Principles of congestion control
  • Principles of congestion control (2)
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 2 (3)
  • Causescosts of congestion scenario 2 (4)
  • Causescosts of congestion scenario 2 (5)
  • Causescosts of congestion scenario 2 (6)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Case study ATM ABR congestion control (2)
  • TCP congestion control
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 27: Congestion in networks

Transport Layer

3-27

R2

l out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2lin

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer

3-28

four sendersmultihop paths timeoutretransmit

Q what happens as lin and lin

rsquo increase

finite shared output link buffers

Host A lout

Causescosts of congestion scenario 3

Host B

Host CHost D

lin original data

lin original data plus retransmitted data

A as red linrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput g 0

Transport Layer

3-29

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

l out

linrsquo

Transport Layer

3-30

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end congestion

controlno explicit

feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer

3-31Case study ATM ABR congestion

controlABR available bit rateldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should use

available bandwidth

if senderrsquos path congested sender throttled to

minimum guaranteed rate

RM (resource management) cellssent by sender interspersed

with data cellsbits in RM cell set by switches

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild

congestion) CI bit congestion indication

RM cells returned to sender by receiver with bits intact

Transport Layer

3-32Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in

returned RM cell

RM cell data cell

Transport Layer

3-33

TCP congestion control

Transport Layer

3-34

TCP congestion control additive increase multiplicative decrease approach sender increases transmission

rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half

after loss

cwnd

TCP

sen

der

cong

estio

n w

indo

w s

ize

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer

3-35TCP Congestion Control details

sender limits transmission

cwnd is dynamic function of perceived network congestion

TCP sending rate

roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte

ACKed sent not-yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent-LastByteAcked

lt cwnd

sender sequence number space

rate~~cwndRTT bytessec

Transport Layer

3-36TCP Slow Start

when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for

every ACK received

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer

3-37

TCP detecting reacting to loss

loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer

3-38

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementationvariable ssthresh on loss event ssthresh is

set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer

3-39

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer

3-40TCP throughput

avg TCP thruput as function of window size RTTignore slow start assume always data to

sendW window size (measured in bytes) where loss

occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW

W2

avg TCP thruput = 34WRTTbytessec

Transport Layer

3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want

10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L

[Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer

3-42

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP Fairness

TCP connection 2

Transport Layer

3-43

Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout

increasesmultiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughputCon

nec t

ion

2 t h

roug

hput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer

3-44

Fairness (more)Fairness and UDP

multimedia apps often do not use TCP do not want rate throttled

by congestion control

instead use UDP send audiovideo at

constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connectionsnew app asks for 1 TCP gets

rate R10new app asks for 11 TCPs

gets R2

  • Congestion in networks
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (2)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • TCP fast retransmit (2)
  • TCP flow control
  • TCP flow control (2)
  • Connection Management
  • Agreeing to establish a connection
  • Agreeing to establish a connection (2)
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • TCP closing a connection (2)
  • Principles of congestion control
  • Principles of congestion control (2)
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 2 (3)
  • Causescosts of congestion scenario 2 (4)
  • Causescosts of congestion scenario 2 (5)
  • Causescosts of congestion scenario 2 (6)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Case study ATM ABR congestion control (2)
  • TCP congestion control
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 28: Congestion in networks

Transport Layer

3-28

four sendersmultihop paths timeoutretransmit

Q what happens as lin and lin

rsquo increase

finite shared output link buffers

Host A lout

Causescosts of congestion scenario 3

Host B

Host CHost D

lin original data

lin original data plus retransmitted data

A as red linrsquo increases all

arriving blue pkts at upper queue are dropped blue throughput g 0

Transport Layer

3-29

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

l out

linrsquo

Transport Layer

3-30

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end congestion

controlno explicit

feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer

3-31Case study ATM ABR congestion

controlABR available bit rateldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should use

available bandwidth

if senderrsquos path congested sender throttled to

minimum guaranteed rate

RM (resource management) cellssent by sender interspersed

with data cellsbits in RM cell set by switches

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild

congestion) CI bit congestion indication

RM cells returned to sender by receiver with bits intact

Transport Layer

3-32Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in

returned RM cell

RM cell data cell

Transport Layer

3-33

TCP congestion control

Transport Layer

3-34

TCP congestion control additive increase multiplicative decrease approach sender increases transmission

rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half

after loss

cwnd

TCP

sen

der

cong

estio

n w

indo

w s

ize

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer

3-35TCP Congestion Control details

sender limits transmission

cwnd is dynamic function of perceived network congestion

TCP sending rate

roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte

ACKed sent not-yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent-LastByteAcked

lt cwnd

sender sequence number space

rate~~cwndRTT bytessec

Transport Layer

3-36TCP Slow Start

when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for

every ACK received

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer

3-37

TCP detecting reacting to loss

loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer

3-38

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementationvariable ssthresh on loss event ssthresh is

set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer

3-39

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer

3-40TCP throughput

avg TCP thruput as function of window size RTTignore slow start assume always data to

sendW window size (measured in bytes) where loss

occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW

W2

avg TCP thruput = 34WRTTbytessec

Transport Layer

3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want

10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L

[Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer

3-42

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP Fairness

TCP connection 2

Transport Layer

3-43

Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout

increasesmultiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughputCon

nec t

ion

2 t h

roug

hput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer

3-44

Fairness (more)Fairness and UDP

multimedia apps often do not use TCP do not want rate throttled

by congestion control

instead use UDP send audiovideo at

constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connectionsnew app asks for 1 TCP gets

rate R10new app asks for 11 TCPs

gets R2

  • Congestion in networks
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (2)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • TCP fast retransmit (2)
  • TCP flow control
  • TCP flow control (2)
  • Connection Management
  • Agreeing to establish a connection
  • Agreeing to establish a connection (2)
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • TCP closing a connection (2)
  • Principles of congestion control
  • Principles of congestion control (2)
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 2 (3)
  • Causescosts of congestion scenario 2 (4)
  • Causescosts of congestion scenario 2 (5)
  • Causescosts of congestion scenario 2 (6)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Case study ATM ABR congestion control (2)
  • TCP congestion control
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 29: Congestion in networks

Transport Layer

3-29

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Causescosts of congestion scenario 3

C2

C2

l out

linrsquo

Transport Layer

3-30

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end congestion

controlno explicit

feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer

3-31Case study ATM ABR congestion

controlABR available bit rateldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should use

available bandwidth

if senderrsquos path congested sender throttled to

minimum guaranteed rate

RM (resource management) cellssent by sender interspersed

with data cellsbits in RM cell set by switches

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild

congestion) CI bit congestion indication

RM cells returned to sender by receiver with bits intact

Transport Layer

3-32Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in

returned RM cell

RM cell data cell

Transport Layer

3-33

TCP congestion control

Transport Layer

3-34

TCP congestion control additive increase multiplicative decrease approach sender increases transmission

rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half

after loss

cwnd

TCP

sen

der

cong

estio

n w

indo

w s

ize

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer

3-35TCP Congestion Control details

sender limits transmission

cwnd is dynamic function of perceived network congestion

TCP sending rate

roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte

ACKed sent not-yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent-LastByteAcked

lt cwnd

sender sequence number space

rate~~cwndRTT bytessec

Transport Layer

3-36TCP Slow Start

when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for

every ACK received

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer

3-37

TCP detecting reacting to loss

loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer

3-38

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementationvariable ssthresh on loss event ssthresh is

set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer

3-39

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer

3-40TCP throughput

avg TCP thruput as function of window size RTTignore slow start assume always data to

sendW window size (measured in bytes) where loss

occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW

W2

avg TCP thruput = 34WRTTbytessec

Transport Layer

3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want

10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L

[Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer

3-42

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP Fairness

TCP connection 2

Transport Layer

3-43

Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout

increasesmultiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughputCon

nec t

ion

2 t h

roug

hput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer

3-44

Fairness (more)Fairness and UDP

multimedia apps often do not use TCP do not want rate throttled

by congestion control

instead use UDP send audiovideo at

constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connectionsnew app asks for 1 TCP gets

rate R10new app asks for 11 TCPs

gets R2

  • Congestion in networks
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (2)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • TCP fast retransmit (2)
  • TCP flow control
  • TCP flow control (2)
  • Connection Management
  • Agreeing to establish a connection
  • Agreeing to establish a connection (2)
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • TCP closing a connection (2)
  • Principles of congestion control
  • Principles of congestion control (2)
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 2 (3)
  • Causescosts of congestion scenario 2 (4)
  • Causescosts of congestion scenario 2 (5)
  • Causescosts of congestion scenario 2 (6)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Case study ATM ABR congestion control (2)
  • TCP congestion control
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 30: Congestion in networks

Transport Layer

3-30

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end congestion

controlno explicit

feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer

3-31Case study ATM ABR congestion

controlABR available bit rateldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should use

available bandwidth

if senderrsquos path congested sender throttled to

minimum guaranteed rate

RM (resource management) cellssent by sender interspersed

with data cellsbits in RM cell set by switches

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild

congestion) CI bit congestion indication

RM cells returned to sender by receiver with bits intact

Transport Layer

3-32Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in

returned RM cell

RM cell data cell

Transport Layer

3-33

TCP congestion control

Transport Layer

3-34

TCP congestion control additive increase multiplicative decrease approach sender increases transmission

rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half

after loss

cwnd

TCP

sen

der

cong

estio

n w

indo

w s

ize

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer

3-35TCP Congestion Control details

sender limits transmission

cwnd is dynamic function of perceived network congestion

TCP sending rate

roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte

ACKed sent not-yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent-LastByteAcked

lt cwnd

sender sequence number space

rate~~cwndRTT bytessec

Transport Layer

3-36TCP Slow Start

when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for

every ACK received

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer

3-37

TCP detecting reacting to loss

loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer

3-38

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementationvariable ssthresh on loss event ssthresh is

set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer

3-39

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer

3-40TCP throughput

avg TCP thruput as function of window size RTTignore slow start assume always data to

sendW window size (measured in bytes) where loss

occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW

W2

avg TCP thruput = 34WRTTbytessec

Transport Layer

3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want

10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L

[Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer

3-42

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP Fairness

TCP connection 2

Transport Layer

3-43

Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout

increasesmultiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughputCon

nec t

ion

2 t h

roug

hput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer

3-44

Fairness (more)Fairness and UDP

multimedia apps often do not use TCP do not want rate throttled

by congestion control

instead use UDP send audiovideo at

constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connectionsnew app asks for 1 TCP gets

rate R10new app asks for 11 TCPs

gets R2

  • Congestion in networks
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (2)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • TCP fast retransmit (2)
  • TCP flow control
  • TCP flow control (2)
  • Connection Management
  • Agreeing to establish a connection
  • Agreeing to establish a connection (2)
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • TCP closing a connection (2)
  • Principles of congestion control
  • Principles of congestion control (2)
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 2 (3)
  • Causescosts of congestion scenario 2 (4)
  • Causescosts of congestion scenario 2 (5)
  • Causescosts of congestion scenario 2 (6)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Case study ATM ABR congestion control (2)
  • TCP congestion control
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 31: Congestion in networks

Transport Layer

3-31Case study ATM ABR congestion

controlABR available bit rateldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should use

available bandwidth

if senderrsquos path congested sender throttled to

minimum guaranteed rate

RM (resource management) cellssent by sender interspersed

with data cellsbits in RM cell set by switches

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild

congestion) CI bit congestion indication

RM cells returned to sender by receiver with bits intact

Transport Layer

3-32Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in

returned RM cell

RM cell data cell

Transport Layer

3-33

TCP congestion control

Transport Layer

3-34

TCP congestion control additive increase multiplicative decrease approach sender increases transmission

rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half

after loss

cwnd

TCP

sen

der

cong

estio

n w

indo

w s

ize

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer

3-35TCP Congestion Control details

sender limits transmission

cwnd is dynamic function of perceived network congestion

TCP sending rate

roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte

ACKed sent not-yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent-LastByteAcked

lt cwnd

sender sequence number space

rate~~cwndRTT bytessec

Transport Layer

3-36TCP Slow Start

when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for

every ACK received

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer

3-37

TCP detecting reacting to loss

loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer

3-38

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementationvariable ssthresh on loss event ssthresh is

set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer

3-39

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer

3-40TCP throughput

avg TCP thruput as function of window size RTTignore slow start assume always data to

sendW window size (measured in bytes) where loss

occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW

W2

avg TCP thruput = 34WRTTbytessec

Transport Layer

3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want

10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L

[Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer

3-42

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP Fairness

TCP connection 2

Transport Layer

3-43

Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout

increasesmultiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughputCon

nec t

ion

2 t h

roug

hput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer

3-44

Fairness (more)Fairness and UDP

multimedia apps often do not use TCP do not want rate throttled

by congestion control

instead use UDP send audiovideo at

constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connectionsnew app asks for 1 TCP gets

rate R10new app asks for 11 TCPs

gets R2

  • Congestion in networks
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (2)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • TCP fast retransmit (2)
  • TCP flow control
  • TCP flow control (2)
  • Connection Management
  • Agreeing to establish a connection
  • Agreeing to establish a connection (2)
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • TCP closing a connection (2)
  • Principles of congestion control
  • Principles of congestion control (2)
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 2 (3)
  • Causescosts of congestion scenario 2 (4)
  • Causescosts of congestion scenario 2 (5)
  • Causescosts of congestion scenario 2 (6)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Case study ATM ABR congestion control (2)
  • TCP congestion control
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 32: Congestion in networks

Transport Layer

3-32Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver sets CI bit in

returned RM cell

RM cell data cell

Transport Layer

3-33

TCP congestion control

Transport Layer

3-34

TCP congestion control additive increase multiplicative decrease approach sender increases transmission

rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half

after loss

cwnd

TCP

sen

der

cong

estio

n w

indo

w s

ize

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer

3-35TCP Congestion Control details

sender limits transmission

cwnd is dynamic function of perceived network congestion

TCP sending rate

roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte

ACKed sent not-yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent-LastByteAcked

lt cwnd

sender sequence number space

rate~~cwndRTT bytessec

Transport Layer

3-36TCP Slow Start

when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for

every ACK received

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer

3-37

TCP detecting reacting to loss

loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer

3-38

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementationvariable ssthresh on loss event ssthresh is

set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer

3-39

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer

3-40TCP throughput

avg TCP thruput as function of window size RTTignore slow start assume always data to

sendW window size (measured in bytes) where loss

occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW

W2

avg TCP thruput = 34WRTTbytessec

Transport Layer

3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want

10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L

[Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer

3-42

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP Fairness

TCP connection 2

Transport Layer

3-43

Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout

increasesmultiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughputCon

nec t

ion

2 t h

roug

hput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer

3-44

Fairness (more)Fairness and UDP

multimedia apps often do not use TCP do not want rate throttled

by congestion control

instead use UDP send audiovideo at

constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connectionsnew app asks for 1 TCP gets

rate R10new app asks for 11 TCPs

gets R2

  • Congestion in networks
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (2)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • TCP fast retransmit (2)
  • TCP flow control
  • TCP flow control (2)
  • Connection Management
  • Agreeing to establish a connection
  • Agreeing to establish a connection (2)
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • TCP closing a connection (2)
  • Principles of congestion control
  • Principles of congestion control (2)
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 2 (3)
  • Causescosts of congestion scenario 2 (4)
  • Causescosts of congestion scenario 2 (5)
  • Causescosts of congestion scenario 2 (6)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Case study ATM ABR congestion control (2)
  • TCP congestion control
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 33: Congestion in networks

Transport Layer

3-33

TCP congestion control

Transport Layer

3-34

TCP congestion control additive increase multiplicative decrease approach sender increases transmission

rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half

after loss

cwnd

TCP

sen

der

cong

estio

n w

indo

w s

ize

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer

3-35TCP Congestion Control details

sender limits transmission

cwnd is dynamic function of perceived network congestion

TCP sending rate

roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte

ACKed sent not-yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent-LastByteAcked

lt cwnd

sender sequence number space

rate~~cwndRTT bytessec

Transport Layer

3-36TCP Slow Start

when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for

every ACK received

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer

3-37

TCP detecting reacting to loss

loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer

3-38

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementationvariable ssthresh on loss event ssthresh is

set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer

3-39

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer

3-40TCP throughput

avg TCP thruput as function of window size RTTignore slow start assume always data to

sendW window size (measured in bytes) where loss

occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW

W2

avg TCP thruput = 34WRTTbytessec

Transport Layer

3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want

10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L

[Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer

3-42

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP Fairness

TCP connection 2

Transport Layer

3-43

Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout

increasesmultiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughputCon

nec t

ion

2 t h

roug

hput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer

3-44

Fairness (more)Fairness and UDP

multimedia apps often do not use TCP do not want rate throttled

by congestion control

instead use UDP send audiovideo at

constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connectionsnew app asks for 1 TCP gets

rate R10new app asks for 11 TCPs

gets R2

  • Congestion in networks
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (2)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • TCP fast retransmit (2)
  • TCP flow control
  • TCP flow control (2)
  • Connection Management
  • Agreeing to establish a connection
  • Agreeing to establish a connection (2)
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • TCP closing a connection (2)
  • Principles of congestion control
  • Principles of congestion control (2)
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 2 (3)
  • Causescosts of congestion scenario 2 (4)
  • Causescosts of congestion scenario 2 (5)
  • Causescosts of congestion scenario 2 (6)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Case study ATM ABR congestion control (2)
  • TCP congestion control
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 34: Congestion in networks

Transport Layer

3-34

TCP congestion control additive increase multiplicative decrease approach sender increases transmission

rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detectedmultiplicative decrease cut cwnd in half

after loss

cwnd

TCP

sen

der

cong

estio

n w

indo

w s

ize

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer

3-35TCP Congestion Control details

sender limits transmission

cwnd is dynamic function of perceived network congestion

TCP sending rate

roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte

ACKed sent not-yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent-LastByteAcked

lt cwnd

sender sequence number space

rate~~cwndRTT bytessec

Transport Layer

3-36TCP Slow Start

when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for

every ACK received

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer

3-37

TCP detecting reacting to loss

loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer

3-38

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementationvariable ssthresh on loss event ssthresh is

set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer

3-39

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer

3-40TCP throughput

avg TCP thruput as function of window size RTTignore slow start assume always data to

sendW window size (measured in bytes) where loss

occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW

W2

avg TCP thruput = 34WRTTbytessec

Transport Layer

3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want

10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L

[Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer

3-42

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP Fairness

TCP connection 2

Transport Layer

3-43

Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout

increasesmultiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughputCon

nec t

ion

2 t h

roug

hput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer

3-44

Fairness (more)Fairness and UDP

multimedia apps often do not use TCP do not want rate throttled

by congestion control

instead use UDP send audiovideo at

constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connectionsnew app asks for 1 TCP gets

rate R10new app asks for 11 TCPs

gets R2

  • Congestion in networks
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (2)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • TCP fast retransmit (2)
  • TCP flow control
  • TCP flow control (2)
  • Connection Management
  • Agreeing to establish a connection
  • Agreeing to establish a connection (2)
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • TCP closing a connection (2)
  • Principles of congestion control
  • Principles of congestion control (2)
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 2 (3)
  • Causescosts of congestion scenario 2 (4)
  • Causescosts of congestion scenario 2 (5)
  • Causescosts of congestion scenario 2 (6)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Case study ATM ABR congestion control (2)
  • TCP congestion control
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 35: Congestion in networks

Transport Layer

3-35TCP Congestion Control details

sender limits transmission

cwnd is dynamic function of perceived network congestion

TCP sending rate

roughly send cwnd bytes wait RTT for ACKS then send more byteslast byte

ACKed sent not-yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent-LastByteAcked

lt cwnd

sender sequence number space

rate~~cwndRTT bytessec

Transport Layer

3-36TCP Slow Start

when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for

every ACK received

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer

3-37

TCP detecting reacting to loss

loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer

3-38

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementationvariable ssthresh on loss event ssthresh is

set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer

3-39

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer

3-40TCP throughput

avg TCP thruput as function of window size RTTignore slow start assume always data to

sendW window size (measured in bytes) where loss

occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW

W2

avg TCP thruput = 34WRTTbytessec

Transport Layer

3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want

10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L

[Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer

3-42

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP Fairness

TCP connection 2

Transport Layer

3-43

Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout

increasesmultiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughputCon

nec t

ion

2 t h

roug

hput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer

3-44

Fairness (more)Fairness and UDP

multimedia apps often do not use TCP do not want rate throttled

by congestion control

instead use UDP send audiovideo at

constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connectionsnew app asks for 1 TCP gets

rate R10new app asks for 11 TCPs

gets R2

  • Congestion in networks
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (2)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • TCP fast retransmit (2)
  • TCP flow control
  • TCP flow control (2)
  • Connection Management
  • Agreeing to establish a connection
  • Agreeing to establish a connection (2)
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • TCP closing a connection (2)
  • Principles of congestion control
  • Principles of congestion control (2)
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 2 (3)
  • Causescosts of congestion scenario 2 (4)
  • Causescosts of congestion scenario 2 (5)
  • Causescosts of congestion scenario 2 (6)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Case study ATM ABR congestion control (2)
  • TCP congestion control
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 36: Congestion in networks

Transport Layer

3-36TCP Slow Start

when connection begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for

every ACK received

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer

3-37

TCP detecting reacting to loss

loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer

3-38

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementationvariable ssthresh on loss event ssthresh is

set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer

3-39

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer

3-40TCP throughput

avg TCP thruput as function of window size RTTignore slow start assume always data to

sendW window size (measured in bytes) where loss

occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW

W2

avg TCP thruput = 34WRTTbytessec

Transport Layer

3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want

10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L

[Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer

3-42

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP Fairness

TCP connection 2

Transport Layer

3-43

Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout

increasesmultiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughputCon

nec t

ion

2 t h

roug

hput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer

3-44

Fairness (more)Fairness and UDP

multimedia apps often do not use TCP do not want rate throttled

by congestion control

instead use UDP send audiovideo at

constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connectionsnew app asks for 1 TCP gets

rate R10new app asks for 11 TCPs

gets R2

  • Congestion in networks
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (2)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • TCP fast retransmit (2)
  • TCP flow control
  • TCP flow control (2)
  • Connection Management
  • Agreeing to establish a connection
  • Agreeing to establish a connection (2)
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • TCP closing a connection (2)
  • Principles of congestion control
  • Principles of congestion control (2)
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 2 (3)
  • Causescosts of congestion scenario 2 (4)
  • Causescosts of congestion scenario 2 (5)
  • Causescosts of congestion scenario 2 (6)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Case study ATM ABR congestion control (2)
  • TCP congestion control
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 37: Congestion in networks

Transport Layer

3-37

TCP detecting reacting to loss

loss indicated by timeoutcwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold then

grows linearlyloss indicated by 3 duplicate ACKs TCP RENOdup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer

3-38

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementationvariable ssthresh on loss event ssthresh is

set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer

3-39

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer

3-40TCP throughput

avg TCP thruput as function of window size RTTignore slow start assume always data to

sendW window size (measured in bytes) where loss

occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW

W2

avg TCP thruput = 34WRTTbytessec

Transport Layer

3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want

10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L

[Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer

3-42

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP Fairness

TCP connection 2

Transport Layer

3-43

Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout

increasesmultiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughputCon

nec t

ion

2 t h

roug

hput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer

3-44

Fairness (more)Fairness and UDP

multimedia apps often do not use TCP do not want rate throttled

by congestion control

instead use UDP send audiovideo at

constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connectionsnew app asks for 1 TCP gets

rate R10new app asks for 11 TCPs

gets R2

  • Congestion in networks
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (2)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • TCP fast retransmit (2)
  • TCP flow control
  • TCP flow control (2)
  • Connection Management
  • Agreeing to establish a connection
  • Agreeing to establish a connection (2)
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • TCP closing a connection (2)
  • Principles of congestion control
  • Principles of congestion control (2)
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 2 (3)
  • Causescosts of congestion scenario 2 (4)
  • Causescosts of congestion scenario 2 (5)
  • Causescosts of congestion scenario 2 (6)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Case study ATM ABR congestion control (2)
  • TCP congestion control
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 38: Congestion in networks

Transport Layer

3-38

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementationvariable ssthresh on loss event ssthresh is

set to 12 of cwnd just before loss event

TCP switching from slow start to CA

Transport Layer

3-39

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer

3-40TCP throughput

avg TCP thruput as function of window size RTTignore slow start assume always data to

sendW window size (measured in bytes) where loss

occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW

W2

avg TCP thruput = 34WRTTbytessec

Transport Layer

3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want

10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L

[Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer

3-42

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP Fairness

TCP connection 2

Transport Layer

3-43

Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout

increasesmultiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughputCon

nec t

ion

2 t h

roug

hput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer

3-44

Fairness (more)Fairness and UDP

multimedia apps often do not use TCP do not want rate throttled

by congestion control

instead use UDP send audiovideo at

constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connectionsnew app asks for 1 TCP gets

rate R10new app asks for 11 TCPs

gets R2

  • Congestion in networks
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (2)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • TCP fast retransmit (2)
  • TCP flow control
  • TCP flow control (2)
  • Connection Management
  • Agreeing to establish a connection
  • Agreeing to establish a connection (2)
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • TCP closing a connection (2)
  • Principles of congestion control
  • Principles of congestion control (2)
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 2 (3)
  • Causescosts of congestion scenario 2 (4)
  • Causescosts of congestion scenario 2 (5)
  • Causescosts of congestion scenario 2 (6)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Case study ATM ABR congestion control (2)
  • TCP congestion control
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 39: Congestion in networks

Transport Layer

3-39

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer

3-40TCP throughput

avg TCP thruput as function of window size RTTignore slow start assume always data to

sendW window size (measured in bytes) where loss

occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW

W2

avg TCP thruput = 34WRTTbytessec

Transport Layer

3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want

10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L

[Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer

3-42

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP Fairness

TCP connection 2

Transport Layer

3-43

Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout

increasesmultiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughputCon

nec t

ion

2 t h

roug

hput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer

3-44

Fairness (more)Fairness and UDP

multimedia apps often do not use TCP do not want rate throttled

by congestion control

instead use UDP send audiovideo at

constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connectionsnew app asks for 1 TCP gets

rate R10new app asks for 11 TCPs

gets R2

  • Congestion in networks
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (2)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • TCP fast retransmit (2)
  • TCP flow control
  • TCP flow control (2)
  • Connection Management
  • Agreeing to establish a connection
  • Agreeing to establish a connection (2)
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • TCP closing a connection (2)
  • Principles of congestion control
  • Principles of congestion control (2)
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 2 (3)
  • Causescosts of congestion scenario 2 (4)
  • Causescosts of congestion scenario 2 (5)
  • Causescosts of congestion scenario 2 (6)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Case study ATM ABR congestion control (2)
  • TCP congestion control
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 40: Congestion in networks

Transport Layer

3-40TCP throughput

avg TCP thruput as function of window size RTTignore slow start assume always data to

sendW window size (measured in bytes) where loss

occursavg window size ( in-flight bytes) is frac34 Wavg thruput is 34W per RTTW

W2

avg TCP thruput = 34WRTTbytessec

Transport Layer

3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want

10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L

[Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer

3-42

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP Fairness

TCP connection 2

Transport Layer

3-43

Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout

increasesmultiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughputCon

nec t

ion

2 t h

roug

hput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer

3-44

Fairness (more)Fairness and UDP

multimedia apps often do not use TCP do not want rate throttled

by congestion control

instead use UDP send audiovideo at

constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connectionsnew app asks for 1 TCP gets

rate R10new app asks for 11 TCPs

gets R2

  • Congestion in networks
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (2)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • TCP fast retransmit (2)
  • TCP flow control
  • TCP flow control (2)
  • Connection Management
  • Agreeing to establish a connection
  • Agreeing to establish a connection (2)
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • TCP closing a connection (2)
  • Principles of congestion control
  • Principles of congestion control (2)
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 2 (3)
  • Causescosts of congestion scenario 2 (4)
  • Causescosts of congestion scenario 2 (5)
  • Causescosts of congestion scenario 2 (6)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Case study ATM ABR congestion control (2)
  • TCP congestion control
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 41: Congestion in networks

Transport Layer

3-41TCP Futures TCP over ldquolong fat pipesrdquoexample 1500 byte segments 100ms RTT want

10 Gbps throughputrequires W = 83333 in-flight segmentsthroughput in terms of segment loss probability L

[Mathis 1997]

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash a very small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

Transport Layer

3-42

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP Fairness

TCP connection 2

Transport Layer

3-43

Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout

increasesmultiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughputCon

nec t

ion

2 t h

roug

hput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer

3-44

Fairness (more)Fairness and UDP

multimedia apps often do not use TCP do not want rate throttled

by congestion control

instead use UDP send audiovideo at

constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connectionsnew app asks for 1 TCP gets

rate R10new app asks for 11 TCPs

gets R2

  • Congestion in networks
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (2)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • TCP fast retransmit (2)
  • TCP flow control
  • TCP flow control (2)
  • Connection Management
  • Agreeing to establish a connection
  • Agreeing to establish a connection (2)
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • TCP closing a connection (2)
  • Principles of congestion control
  • Principles of congestion control (2)
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 2 (3)
  • Causescosts of congestion scenario 2 (4)
  • Causescosts of congestion scenario 2 (5)
  • Causescosts of congestion scenario 2 (6)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Case study ATM ABR congestion control (2)
  • TCP congestion control
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 42: Congestion in networks

Transport Layer

3-42

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP Fairness

TCP connection 2

Transport Layer

3-43

Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout

increasesmultiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughputCon

nec t

ion

2 t h

roug

hput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer

3-44

Fairness (more)Fairness and UDP

multimedia apps often do not use TCP do not want rate throttled

by congestion control

instead use UDP send audiovideo at

constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connectionsnew app asks for 1 TCP gets

rate R10new app asks for 11 TCPs

gets R2

  • Congestion in networks
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (2)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • TCP fast retransmit (2)
  • TCP flow control
  • TCP flow control (2)
  • Connection Management
  • Agreeing to establish a connection
  • Agreeing to establish a connection (2)
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • TCP closing a connection (2)
  • Principles of congestion control
  • Principles of congestion control (2)
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 2 (3)
  • Causescosts of congestion scenario 2 (4)
  • Causescosts of congestion scenario 2 (5)
  • Causescosts of congestion scenario 2 (6)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Case study ATM ABR congestion control (2)
  • TCP congestion control
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 43: Congestion in networks

Transport Layer

3-43

Why is TCP fairtwo competing sessionsadditive increase gives slope of 1 as throughout

increasesmultiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughputCon

nec t

ion

2 t h

roug

hput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer

3-44

Fairness (more)Fairness and UDP

multimedia apps often do not use TCP do not want rate throttled

by congestion control

instead use UDP send audiovideo at

constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connectionsnew app asks for 1 TCP gets

rate R10new app asks for 11 TCPs

gets R2

  • Congestion in networks
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (2)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • TCP fast retransmit (2)
  • TCP flow control
  • TCP flow control (2)
  • Connection Management
  • Agreeing to establish a connection
  • Agreeing to establish a connection (2)
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • TCP closing a connection (2)
  • Principles of congestion control
  • Principles of congestion control (2)
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 2 (3)
  • Causescosts of congestion scenario 2 (4)
  • Causescosts of congestion scenario 2 (5)
  • Causescosts of congestion scenario 2 (6)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Case study ATM ABR congestion control (2)
  • TCP congestion control
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 44: Congestion in networks

Transport Layer

3-44

Fairness (more)Fairness and UDP

multimedia apps often do not use TCP do not want rate throttled

by congestion control

instead use UDP send audiovideo at

constant rate tolerate packet loss

Fairness parallel TCP connections

application can open multiple parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connectionsnew app asks for 1 TCP gets

rate R10new app asks for 11 TCPs

gets R2

  • Congestion in networks
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • TCP retransmission scenarios (2)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • TCP fast retransmit (2)
  • TCP flow control
  • TCP flow control (2)
  • Connection Management
  • Agreeing to establish a connection
  • Agreeing to establish a connection (2)
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • TCP closing a connection (2)
  • Principles of congestion control
  • Principles of congestion control (2)
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 2 (3)
  • Causescosts of congestion scenario 2 (4)
  • Causescosts of congestion scenario 2 (5)
  • Causescosts of congestion scenario 2 (6)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Case study ATM ABR congestion control (2)
  • TCP congestion control
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP switching from slow start to CA
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)