Chapter 3: Transport Layer Our goals: r understand principles behind transport layer services: m...

53
Chapter 3: Transport Layer Our goals: understand principles behind transport layer services: multiplexing/ demultiplexing reliable data transfer flow control congestion control learn about transport layer protocols in the Internet: UDP: connectionless transport TCP: connection- oriented transport TCP congestion control

Transcript of Chapter 3: Transport Layer Our goals: r understand principles behind transport layer services: m...

Page 1: Chapter 3: Transport Layer Our goals: r understand principles behind transport layer services: m multiplexing/demultiple xing m reliable data transfer.

Chapter 3 Transport Layer

Our goals understand principles

behind transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

learn about transport layer protocols in the Internet UDP connectionless

transport TCP connection-oriented

transport TCP congestion control

Transport services and protocols provide logical

communication between app processes running on different hosts

transport protocols run in end systems

reliable in-order delivery (TCP)

unreliable unordered delivery UDP

application

transportnetworkdata linkphysical

application

transportnetworkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysicalnetwork

data linkphysical

logical end-end transport

Multiplexingdemultiplexing

application

transport

network

link

physical

P1 application

transport

network

link

physical

application

transport

network

link

physical

P2P3 P4P1

host 1 host 2 host 3

= process= socket

delivering received segmentsto correct socket

Demultiplexing at rcv hostgathering data from multiplesockets enveloping data with header (later used for demultiplexing)

Multiplexing at send host

How demultiplexing works host receives IP datagrams host uses IP addresses amp port

numbers to direct segment to appropriate socket

UDP socket identified by two-tuple (dest IP address dest port number)

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

source port dest port

32 bits

applicationdata

(message)

other header fields

TCPUDP segment format

Connectionless demux (cont)

DatagramSocket serverSocket = new DatagramSocket(6428)

ClientIPB

P2

client IP A

P1P1P3

serverIP C

SP 6428

DP 9157

SP 9157

DP 6428

SP 6428

DP 5775

SP 5775

DP 6428

SP provides ldquoreturn addressrdquo

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

UDP User Datagram Protocol [RFC 768]

ldquono frillsrdquo ldquobare bonesrdquo Internet transport protocol

ldquobest effortrdquo service UDP segments may be lost delivered out of order to

app connectionless

no handshaking between UDP sender receiver

each UDP segment handled independently of others

Why is there a UDP no connection establishment

(which can add delay) simple no connection state

at sender receiver small segment header no congestion control UDP

can blast away as fast as desired

UDP more

often used for streaming multimedia apps loss tolerant rate sensitive

other UDP uses DNS SNMP

reliable transfer over UDP add reliability at application layer application-specific error

recovery

source port dest port

32 bits

Applicationdata

(message)

UDP segment format

length checksumLength in

bytes of UDPsegmentincluding

header

UDP checksum

Sender treat segment contents as

sequence of 16-bit integers checksum addition (1rsquos

complement sum) of segment contents

sender puts checksum value into UDP checksum field

Receiver compute checksum of received

segment check if computed checksum

equals checksum field value NO - error detected YES - no error detected

But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

Principles of Reliable data transfer important in app transport link layers characteristics of unreliable channel will determine

complexity of reliable data transfer protocol (rdt)

Rdt10 reliable transfer over a reliable channel

underlying channel perfectly reliable no bit errors no loss of packets

Rdt20 channel with bit errors

underlying channel may flip bits in packet checksum to detect bit errors

the question how to recover from errors acknowledgements (ACKs) negative acknowledgements (NAKs) sender retransmits pkt on receipt of NAK

new mechanisms in rdt20 error detection receiver feedback

rdt20 has a fatal flaw

What happens if ACKNAK corrupted

sender doesnrsquot know what happened at receiver

canrsquot just retransmit possible duplicate

Handling duplicates sender adds sequence

number to each pkt sender retransmits current

pkt if ACKNAK garbled receiver discards (doesnrsquot

deliver up) duplicate pkt

Sender sends one packet then waits for receiver response

stop and wait

rdt21

Sender seq added to pkt two seq rsquos (01) will

suffice must check if received

ACKNAK corrupted

Receiver must check if received

packet is duplicate note receiver can not

know if its last ACKNAK received OK at sender

rdt22 a NAK-free protocol

same functionality as rdt21 using ACKs only instead of NAK receiver sends ACK for last pkt

received OK duplicate ACK at sender results in same action as NAK

retransmit current pkt

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs) checksum seq ACKs

retransmissions will be of help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

retransmits if no ACK received in this time

if pkt (or ACK) just delayed (not lost) retransmission will be

duplicate but use of seq rsquos already handles this

receiver must specify seq of pkt being ACKed

requires countdown timer

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

Receiver ACK-only always send ACK for correctly-received

pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Go-Back-N

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery to

upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed pkt

advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1]

send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data connection-oriented flow controlled

point-to-point reliable in-order byte

steam pipelined send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next byte

expected from other side

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

longer than RTT too short premature

timeout too long slow reaction to

segment loss

Q how to estimate RTT SampleRTT measured time from

segment transmission until ACK receipt

SampleRTT will vary want estimated RTT ldquosmootherrdquo

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP reliable data transfer

TCP creates reliable data transfer service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks Retransmissions are triggered by

timeout events duplicate acks

TCP sender eventsdata rcvd from app Create segment with

seq seq is byte-stream

number of first data byte in segment

start timer if not already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment that

caused timeout restart timer

Ack rcvd If acknowledges

previously unacked segments update what is known to

be acked start timer if there are

outstanding segments

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment startsat lower end of gap

Fast Retransmit Time-out period often relatively long Detect lost segments via duplicate ACKs If sender receives 3 ACKs for the same data it supposes that

segment after ACKed data was lost

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

fast retransmita duplicate ACK for already ACKed segment

TCP Flow Control

receiver

speed-matching service matching the send rate to the receiving apprsquos drain rate

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow

TCP Connection Management

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much data

too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of

lost packet finite shared output link buffers

Host Ain original data

Host B

out

in original data plus retransmitted data

ldquocostsrdquo of congestion more work (retrans) for

given ldquogoodputrdquo unneeded retransmissions

link carries multiple copies of pkt

always (goodput)

ldquoperfectrdquo retransmission only when

loss

retransmission of delayed (not lost)

packet makes larger (than

perfect case) for same

in

out

=

in

out

gt

in

out

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit when packet dropped any

ldquoupstream transmission capacity used for that packet was wasted

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems

Two broad approaches towards congestion control

TCP Congestion Control

end-end control (no network assistance)

sender limits transmission LastByteSent-LastByteAcked

CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

TCP Slow Start

When connection begins increase rate exponentially until first loss event double CongWin every

RTT done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Refinement

After 3 dup ACKs CongWin is cut in half window then grows linearly (congestion avoidance)

But after timeout event CongWin instead set to 1 MSS window then grows exponentially (slow start) to a threshold then grows linearly (congestion

avoidance)

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

TCP Futures

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed needed

LRTT

MSS221

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughput increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Chapter 3 Summary principles behind transport

layer services multiplexing demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Chapter 3 Transport Layer
  • Transport services and protocols
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demux (cont)
  • Connection-oriented demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Principles of Reliable data transfer
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 has a fatal flaw
  • rdt21
  • rdt22 a NAK-free protocol
  • rdt30 channels with errors and loss
  • Pipelined protocols
  • Go-Back-N
  • Slide 19
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • Example RTT estimation
  • Slide 28
  • TCP reliable data transfer
  • TCP sender events
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • TCP Flow Control
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Approaches towards congestion control
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • Refinement
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures
  • TCP Fairness
  • Why is TCP fair
  • Chapter 3 Summary
Page 2: Chapter 3: Transport Layer Our goals: r understand principles behind transport layer services: m multiplexing/demultiple xing m reliable data transfer.

Transport services and protocols provide logical

communication between app processes running on different hosts

transport protocols run in end systems

reliable in-order delivery (TCP)

unreliable unordered delivery UDP

application

transportnetworkdata linkphysical

application

transportnetworkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysicalnetwork

data linkphysical

logical end-end transport

Multiplexingdemultiplexing

application

transport

network

link

physical

P1 application

transport

network

link

physical

application

transport

network

link

physical

P2P3 P4P1

host 1 host 2 host 3

= process= socket

delivering received segmentsto correct socket

Demultiplexing at rcv hostgathering data from multiplesockets enveloping data with header (later used for demultiplexing)

Multiplexing at send host

How demultiplexing works host receives IP datagrams host uses IP addresses amp port

numbers to direct segment to appropriate socket

UDP socket identified by two-tuple (dest IP address dest port number)

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

source port dest port

32 bits

applicationdata

(message)

other header fields

TCPUDP segment format

Connectionless demux (cont)

DatagramSocket serverSocket = new DatagramSocket(6428)

ClientIPB

P2

client IP A

P1P1P3

serverIP C

SP 6428

DP 9157

SP 9157

DP 6428

SP 6428

DP 5775

SP 5775

DP 6428

SP provides ldquoreturn addressrdquo

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

UDP User Datagram Protocol [RFC 768]

ldquono frillsrdquo ldquobare bonesrdquo Internet transport protocol

ldquobest effortrdquo service UDP segments may be lost delivered out of order to

app connectionless

no handshaking between UDP sender receiver

each UDP segment handled independently of others

Why is there a UDP no connection establishment

(which can add delay) simple no connection state

at sender receiver small segment header no congestion control UDP

can blast away as fast as desired

UDP more

often used for streaming multimedia apps loss tolerant rate sensitive

other UDP uses DNS SNMP

reliable transfer over UDP add reliability at application layer application-specific error

recovery

source port dest port

32 bits

Applicationdata

(message)

UDP segment format

length checksumLength in

bytes of UDPsegmentincluding

header

UDP checksum

Sender treat segment contents as

sequence of 16-bit integers checksum addition (1rsquos

complement sum) of segment contents

sender puts checksum value into UDP checksum field

Receiver compute checksum of received

segment check if computed checksum

equals checksum field value NO - error detected YES - no error detected

But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

Principles of Reliable data transfer important in app transport link layers characteristics of unreliable channel will determine

complexity of reliable data transfer protocol (rdt)

Rdt10 reliable transfer over a reliable channel

underlying channel perfectly reliable no bit errors no loss of packets

Rdt20 channel with bit errors

underlying channel may flip bits in packet checksum to detect bit errors

the question how to recover from errors acknowledgements (ACKs) negative acknowledgements (NAKs) sender retransmits pkt on receipt of NAK

new mechanisms in rdt20 error detection receiver feedback

rdt20 has a fatal flaw

What happens if ACKNAK corrupted

sender doesnrsquot know what happened at receiver

canrsquot just retransmit possible duplicate

Handling duplicates sender adds sequence

number to each pkt sender retransmits current

pkt if ACKNAK garbled receiver discards (doesnrsquot

deliver up) duplicate pkt

Sender sends one packet then waits for receiver response

stop and wait

rdt21

Sender seq added to pkt two seq rsquos (01) will

suffice must check if received

ACKNAK corrupted

Receiver must check if received

packet is duplicate note receiver can not

know if its last ACKNAK received OK at sender

rdt22 a NAK-free protocol

same functionality as rdt21 using ACKs only instead of NAK receiver sends ACK for last pkt

received OK duplicate ACK at sender results in same action as NAK

retransmit current pkt

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs) checksum seq ACKs

retransmissions will be of help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

retransmits if no ACK received in this time

if pkt (or ACK) just delayed (not lost) retransmission will be

duplicate but use of seq rsquos already handles this

receiver must specify seq of pkt being ACKed

requires countdown timer

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

Receiver ACK-only always send ACK for correctly-received

pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Go-Back-N

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery to

upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed pkt

advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1]

send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data connection-oriented flow controlled

point-to-point reliable in-order byte

steam pipelined send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next byte

expected from other side

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

longer than RTT too short premature

timeout too long slow reaction to

segment loss

Q how to estimate RTT SampleRTT measured time from

segment transmission until ACK receipt

SampleRTT will vary want estimated RTT ldquosmootherrdquo

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP reliable data transfer

TCP creates reliable data transfer service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks Retransmissions are triggered by

timeout events duplicate acks

TCP sender eventsdata rcvd from app Create segment with

seq seq is byte-stream

number of first data byte in segment

start timer if not already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment that

caused timeout restart timer

Ack rcvd If acknowledges

previously unacked segments update what is known to

be acked start timer if there are

outstanding segments

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment startsat lower end of gap

Fast Retransmit Time-out period often relatively long Detect lost segments via duplicate ACKs If sender receives 3 ACKs for the same data it supposes that

segment after ACKed data was lost

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

fast retransmita duplicate ACK for already ACKed segment

TCP Flow Control

receiver

speed-matching service matching the send rate to the receiving apprsquos drain rate

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow

TCP Connection Management

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much data

too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of

lost packet finite shared output link buffers

Host Ain original data

Host B

out

in original data plus retransmitted data

ldquocostsrdquo of congestion more work (retrans) for

given ldquogoodputrdquo unneeded retransmissions

link carries multiple copies of pkt

always (goodput)

ldquoperfectrdquo retransmission only when

loss

retransmission of delayed (not lost)

packet makes larger (than

perfect case) for same

in

out

=

in

out

gt

in

out

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit when packet dropped any

ldquoupstream transmission capacity used for that packet was wasted

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems

Two broad approaches towards congestion control

TCP Congestion Control

end-end control (no network assistance)

sender limits transmission LastByteSent-LastByteAcked

CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

TCP Slow Start

When connection begins increase rate exponentially until first loss event double CongWin every

RTT done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Refinement

After 3 dup ACKs CongWin is cut in half window then grows linearly (congestion avoidance)

But after timeout event CongWin instead set to 1 MSS window then grows exponentially (slow start) to a threshold then grows linearly (congestion

avoidance)

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

TCP Futures

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed needed

LRTT

MSS221

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughput increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Chapter 3 Summary principles behind transport

layer services multiplexing demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Chapter 3 Transport Layer
  • Transport services and protocols
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demux (cont)
  • Connection-oriented demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Principles of Reliable data transfer
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 has a fatal flaw
  • rdt21
  • rdt22 a NAK-free protocol
  • rdt30 channels with errors and loss
  • Pipelined protocols
  • Go-Back-N
  • Slide 19
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • Example RTT estimation
  • Slide 28
  • TCP reliable data transfer
  • TCP sender events
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • TCP Flow Control
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Approaches towards congestion control
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • Refinement
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures
  • TCP Fairness
  • Why is TCP fair
  • Chapter 3 Summary
Page 3: Chapter 3: Transport Layer Our goals: r understand principles behind transport layer services: m multiplexing/demultiple xing m reliable data transfer.

Multiplexingdemultiplexing

application

transport

network

link

physical

P1 application

transport

network

link

physical

application

transport

network

link

physical

P2P3 P4P1

host 1 host 2 host 3

= process= socket

delivering received segmentsto correct socket

Demultiplexing at rcv hostgathering data from multiplesockets enveloping data with header (later used for demultiplexing)

Multiplexing at send host

How demultiplexing works host receives IP datagrams host uses IP addresses amp port

numbers to direct segment to appropriate socket

UDP socket identified by two-tuple (dest IP address dest port number)

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

source port dest port

32 bits

applicationdata

(message)

other header fields

TCPUDP segment format

Connectionless demux (cont)

DatagramSocket serverSocket = new DatagramSocket(6428)

ClientIPB

P2

client IP A

P1P1P3

serverIP C

SP 6428

DP 9157

SP 9157

DP 6428

SP 6428

DP 5775

SP 5775

DP 6428

SP provides ldquoreturn addressrdquo

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

UDP User Datagram Protocol [RFC 768]

ldquono frillsrdquo ldquobare bonesrdquo Internet transport protocol

ldquobest effortrdquo service UDP segments may be lost delivered out of order to

app connectionless

no handshaking between UDP sender receiver

each UDP segment handled independently of others

Why is there a UDP no connection establishment

(which can add delay) simple no connection state

at sender receiver small segment header no congestion control UDP

can blast away as fast as desired

UDP more

often used for streaming multimedia apps loss tolerant rate sensitive

other UDP uses DNS SNMP

reliable transfer over UDP add reliability at application layer application-specific error

recovery

source port dest port

32 bits

Applicationdata

(message)

UDP segment format

length checksumLength in

bytes of UDPsegmentincluding

header

UDP checksum

Sender treat segment contents as

sequence of 16-bit integers checksum addition (1rsquos

complement sum) of segment contents

sender puts checksum value into UDP checksum field

Receiver compute checksum of received

segment check if computed checksum

equals checksum field value NO - error detected YES - no error detected

But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

Principles of Reliable data transfer important in app transport link layers characteristics of unreliable channel will determine

complexity of reliable data transfer protocol (rdt)

Rdt10 reliable transfer over a reliable channel

underlying channel perfectly reliable no bit errors no loss of packets

Rdt20 channel with bit errors

underlying channel may flip bits in packet checksum to detect bit errors

the question how to recover from errors acknowledgements (ACKs) negative acknowledgements (NAKs) sender retransmits pkt on receipt of NAK

new mechanisms in rdt20 error detection receiver feedback

rdt20 has a fatal flaw

What happens if ACKNAK corrupted

sender doesnrsquot know what happened at receiver

canrsquot just retransmit possible duplicate

Handling duplicates sender adds sequence

number to each pkt sender retransmits current

pkt if ACKNAK garbled receiver discards (doesnrsquot

deliver up) duplicate pkt

Sender sends one packet then waits for receiver response

stop and wait

rdt21

Sender seq added to pkt two seq rsquos (01) will

suffice must check if received

ACKNAK corrupted

Receiver must check if received

packet is duplicate note receiver can not

know if its last ACKNAK received OK at sender

rdt22 a NAK-free protocol

same functionality as rdt21 using ACKs only instead of NAK receiver sends ACK for last pkt

received OK duplicate ACK at sender results in same action as NAK

retransmit current pkt

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs) checksum seq ACKs

retransmissions will be of help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

retransmits if no ACK received in this time

if pkt (or ACK) just delayed (not lost) retransmission will be

duplicate but use of seq rsquos already handles this

receiver must specify seq of pkt being ACKed

requires countdown timer

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

Receiver ACK-only always send ACK for correctly-received

pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Go-Back-N

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery to

upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed pkt

advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1]

send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data connection-oriented flow controlled

point-to-point reliable in-order byte

steam pipelined send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next byte

expected from other side

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

longer than RTT too short premature

timeout too long slow reaction to

segment loss

Q how to estimate RTT SampleRTT measured time from

segment transmission until ACK receipt

SampleRTT will vary want estimated RTT ldquosmootherrdquo

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP reliable data transfer

TCP creates reliable data transfer service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks Retransmissions are triggered by

timeout events duplicate acks

TCP sender eventsdata rcvd from app Create segment with

seq seq is byte-stream

number of first data byte in segment

start timer if not already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment that

caused timeout restart timer

Ack rcvd If acknowledges

previously unacked segments update what is known to

be acked start timer if there are

outstanding segments

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment startsat lower end of gap

Fast Retransmit Time-out period often relatively long Detect lost segments via duplicate ACKs If sender receives 3 ACKs for the same data it supposes that

segment after ACKed data was lost

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

fast retransmita duplicate ACK for already ACKed segment

TCP Flow Control

receiver

speed-matching service matching the send rate to the receiving apprsquos drain rate

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow

TCP Connection Management

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much data

too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of

lost packet finite shared output link buffers

Host Ain original data

Host B

out

in original data plus retransmitted data

ldquocostsrdquo of congestion more work (retrans) for

given ldquogoodputrdquo unneeded retransmissions

link carries multiple copies of pkt

always (goodput)

ldquoperfectrdquo retransmission only when

loss

retransmission of delayed (not lost)

packet makes larger (than

perfect case) for same

in

out

=

in

out

gt

in

out

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit when packet dropped any

ldquoupstream transmission capacity used for that packet was wasted

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems

Two broad approaches towards congestion control

TCP Congestion Control

end-end control (no network assistance)

sender limits transmission LastByteSent-LastByteAcked

CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

TCP Slow Start

When connection begins increase rate exponentially until first loss event double CongWin every

RTT done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Refinement

After 3 dup ACKs CongWin is cut in half window then grows linearly (congestion avoidance)

But after timeout event CongWin instead set to 1 MSS window then grows exponentially (slow start) to a threshold then grows linearly (congestion

avoidance)

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

TCP Futures

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed needed

LRTT

MSS221

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughput increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Chapter 3 Summary principles behind transport

layer services multiplexing demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Chapter 3 Transport Layer
  • Transport services and protocols
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demux (cont)
  • Connection-oriented demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Principles of Reliable data transfer
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 has a fatal flaw
  • rdt21
  • rdt22 a NAK-free protocol
  • rdt30 channels with errors and loss
  • Pipelined protocols
  • Go-Back-N
  • Slide 19
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • Example RTT estimation
  • Slide 28
  • TCP reliable data transfer
  • TCP sender events
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • TCP Flow Control
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Approaches towards congestion control
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • Refinement
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures
  • TCP Fairness
  • Why is TCP fair
  • Chapter 3 Summary
Page 4: Chapter 3: Transport Layer Our goals: r understand principles behind transport layer services: m multiplexing/demultiple xing m reliable data transfer.

How demultiplexing works host receives IP datagrams host uses IP addresses amp port

numbers to direct segment to appropriate socket

UDP socket identified by two-tuple (dest IP address dest port number)

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

source port dest port

32 bits

applicationdata

(message)

other header fields

TCPUDP segment format

Connectionless demux (cont)

DatagramSocket serverSocket = new DatagramSocket(6428)

ClientIPB

P2

client IP A

P1P1P3

serverIP C

SP 6428

DP 9157

SP 9157

DP 6428

SP 6428

DP 5775

SP 5775

DP 6428

SP provides ldquoreturn addressrdquo

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

UDP User Datagram Protocol [RFC 768]

ldquono frillsrdquo ldquobare bonesrdquo Internet transport protocol

ldquobest effortrdquo service UDP segments may be lost delivered out of order to

app connectionless

no handshaking between UDP sender receiver

each UDP segment handled independently of others

Why is there a UDP no connection establishment

(which can add delay) simple no connection state

at sender receiver small segment header no congestion control UDP

can blast away as fast as desired

UDP more

often used for streaming multimedia apps loss tolerant rate sensitive

other UDP uses DNS SNMP

reliable transfer over UDP add reliability at application layer application-specific error

recovery

source port dest port

32 bits

Applicationdata

(message)

UDP segment format

length checksumLength in

bytes of UDPsegmentincluding

header

UDP checksum

Sender treat segment contents as

sequence of 16-bit integers checksum addition (1rsquos

complement sum) of segment contents

sender puts checksum value into UDP checksum field

Receiver compute checksum of received

segment check if computed checksum

equals checksum field value NO - error detected YES - no error detected

But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

Principles of Reliable data transfer important in app transport link layers characteristics of unreliable channel will determine

complexity of reliable data transfer protocol (rdt)

Rdt10 reliable transfer over a reliable channel

underlying channel perfectly reliable no bit errors no loss of packets

Rdt20 channel with bit errors

underlying channel may flip bits in packet checksum to detect bit errors

the question how to recover from errors acknowledgements (ACKs) negative acknowledgements (NAKs) sender retransmits pkt on receipt of NAK

new mechanisms in rdt20 error detection receiver feedback

rdt20 has a fatal flaw

What happens if ACKNAK corrupted

sender doesnrsquot know what happened at receiver

canrsquot just retransmit possible duplicate

Handling duplicates sender adds sequence

number to each pkt sender retransmits current

pkt if ACKNAK garbled receiver discards (doesnrsquot

deliver up) duplicate pkt

Sender sends one packet then waits for receiver response

stop and wait

rdt21

Sender seq added to pkt two seq rsquos (01) will

suffice must check if received

ACKNAK corrupted

Receiver must check if received

packet is duplicate note receiver can not

know if its last ACKNAK received OK at sender

rdt22 a NAK-free protocol

same functionality as rdt21 using ACKs only instead of NAK receiver sends ACK for last pkt

received OK duplicate ACK at sender results in same action as NAK

retransmit current pkt

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs) checksum seq ACKs

retransmissions will be of help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

retransmits if no ACK received in this time

if pkt (or ACK) just delayed (not lost) retransmission will be

duplicate but use of seq rsquos already handles this

receiver must specify seq of pkt being ACKed

requires countdown timer

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

Receiver ACK-only always send ACK for correctly-received

pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Go-Back-N

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery to

upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed pkt

advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1]

send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data connection-oriented flow controlled

point-to-point reliable in-order byte

steam pipelined send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next byte

expected from other side

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

longer than RTT too short premature

timeout too long slow reaction to

segment loss

Q how to estimate RTT SampleRTT measured time from

segment transmission until ACK receipt

SampleRTT will vary want estimated RTT ldquosmootherrdquo

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP reliable data transfer

TCP creates reliable data transfer service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks Retransmissions are triggered by

timeout events duplicate acks

TCP sender eventsdata rcvd from app Create segment with

seq seq is byte-stream

number of first data byte in segment

start timer if not already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment that

caused timeout restart timer

Ack rcvd If acknowledges

previously unacked segments update what is known to

be acked start timer if there are

outstanding segments

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment startsat lower end of gap

Fast Retransmit Time-out period often relatively long Detect lost segments via duplicate ACKs If sender receives 3 ACKs for the same data it supposes that

segment after ACKed data was lost

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

fast retransmita duplicate ACK for already ACKed segment

TCP Flow Control

receiver

speed-matching service matching the send rate to the receiving apprsquos drain rate

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow

TCP Connection Management

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much data

too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of

lost packet finite shared output link buffers

Host Ain original data

Host B

out

in original data plus retransmitted data

ldquocostsrdquo of congestion more work (retrans) for

given ldquogoodputrdquo unneeded retransmissions

link carries multiple copies of pkt

always (goodput)

ldquoperfectrdquo retransmission only when

loss

retransmission of delayed (not lost)

packet makes larger (than

perfect case) for same

in

out

=

in

out

gt

in

out

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit when packet dropped any

ldquoupstream transmission capacity used for that packet was wasted

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems

Two broad approaches towards congestion control

TCP Congestion Control

end-end control (no network assistance)

sender limits transmission LastByteSent-LastByteAcked

CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

TCP Slow Start

When connection begins increase rate exponentially until first loss event double CongWin every

RTT done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Refinement

After 3 dup ACKs CongWin is cut in half window then grows linearly (congestion avoidance)

But after timeout event CongWin instead set to 1 MSS window then grows exponentially (slow start) to a threshold then grows linearly (congestion

avoidance)

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

TCP Futures

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed needed

LRTT

MSS221

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughput increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Chapter 3 Summary principles behind transport

layer services multiplexing demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Chapter 3 Transport Layer
  • Transport services and protocols
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demux (cont)
  • Connection-oriented demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Principles of Reliable data transfer
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 has a fatal flaw
  • rdt21
  • rdt22 a NAK-free protocol
  • rdt30 channels with errors and loss
  • Pipelined protocols
  • Go-Back-N
  • Slide 19
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • Example RTT estimation
  • Slide 28
  • TCP reliable data transfer
  • TCP sender events
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • TCP Flow Control
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Approaches towards congestion control
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • Refinement
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures
  • TCP Fairness
  • Why is TCP fair
  • Chapter 3 Summary
Page 5: Chapter 3: Transport Layer Our goals: r understand principles behind transport layer services: m multiplexing/demultiple xing m reliable data transfer.

Connectionless demux (cont)

DatagramSocket serverSocket = new DatagramSocket(6428)

ClientIPB

P2

client IP A

P1P1P3

serverIP C

SP 6428

DP 9157

SP 9157

DP 6428

SP 6428

DP 5775

SP 5775

DP 6428

SP provides ldquoreturn addressrdquo

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

UDP User Datagram Protocol [RFC 768]

ldquono frillsrdquo ldquobare bonesrdquo Internet transport protocol

ldquobest effortrdquo service UDP segments may be lost delivered out of order to

app connectionless

no handshaking between UDP sender receiver

each UDP segment handled independently of others

Why is there a UDP no connection establishment

(which can add delay) simple no connection state

at sender receiver small segment header no congestion control UDP

can blast away as fast as desired

UDP more

often used for streaming multimedia apps loss tolerant rate sensitive

other UDP uses DNS SNMP

reliable transfer over UDP add reliability at application layer application-specific error

recovery

source port dest port

32 bits

Applicationdata

(message)

UDP segment format

length checksumLength in

bytes of UDPsegmentincluding

header

UDP checksum

Sender treat segment contents as

sequence of 16-bit integers checksum addition (1rsquos

complement sum) of segment contents

sender puts checksum value into UDP checksum field

Receiver compute checksum of received

segment check if computed checksum

equals checksum field value NO - error detected YES - no error detected

But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

Principles of Reliable data transfer important in app transport link layers characteristics of unreliable channel will determine

complexity of reliable data transfer protocol (rdt)

Rdt10 reliable transfer over a reliable channel

underlying channel perfectly reliable no bit errors no loss of packets

Rdt20 channel with bit errors

underlying channel may flip bits in packet checksum to detect bit errors

the question how to recover from errors acknowledgements (ACKs) negative acknowledgements (NAKs) sender retransmits pkt on receipt of NAK

new mechanisms in rdt20 error detection receiver feedback

rdt20 has a fatal flaw

What happens if ACKNAK corrupted

sender doesnrsquot know what happened at receiver

canrsquot just retransmit possible duplicate

Handling duplicates sender adds sequence

number to each pkt sender retransmits current

pkt if ACKNAK garbled receiver discards (doesnrsquot

deliver up) duplicate pkt

Sender sends one packet then waits for receiver response

stop and wait

rdt21

Sender seq added to pkt two seq rsquos (01) will

suffice must check if received

ACKNAK corrupted

Receiver must check if received

packet is duplicate note receiver can not

know if its last ACKNAK received OK at sender

rdt22 a NAK-free protocol

same functionality as rdt21 using ACKs only instead of NAK receiver sends ACK for last pkt

received OK duplicate ACK at sender results in same action as NAK

retransmit current pkt

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs) checksum seq ACKs

retransmissions will be of help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

retransmits if no ACK received in this time

if pkt (or ACK) just delayed (not lost) retransmission will be

duplicate but use of seq rsquos already handles this

receiver must specify seq of pkt being ACKed

requires countdown timer

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

Receiver ACK-only always send ACK for correctly-received

pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Go-Back-N

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery to

upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed pkt

advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1]

send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data connection-oriented flow controlled

point-to-point reliable in-order byte

steam pipelined send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next byte

expected from other side

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

longer than RTT too short premature

timeout too long slow reaction to

segment loss

Q how to estimate RTT SampleRTT measured time from

segment transmission until ACK receipt

SampleRTT will vary want estimated RTT ldquosmootherrdquo

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP reliable data transfer

TCP creates reliable data transfer service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks Retransmissions are triggered by

timeout events duplicate acks

TCP sender eventsdata rcvd from app Create segment with

seq seq is byte-stream

number of first data byte in segment

start timer if not already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment that

caused timeout restart timer

Ack rcvd If acknowledges

previously unacked segments update what is known to

be acked start timer if there are

outstanding segments

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment startsat lower end of gap

Fast Retransmit Time-out period often relatively long Detect lost segments via duplicate ACKs If sender receives 3 ACKs for the same data it supposes that

segment after ACKed data was lost

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

fast retransmita duplicate ACK for already ACKed segment

TCP Flow Control

receiver

speed-matching service matching the send rate to the receiving apprsquos drain rate

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow

TCP Connection Management

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much data

too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of

lost packet finite shared output link buffers

Host Ain original data

Host B

out

in original data plus retransmitted data

ldquocostsrdquo of congestion more work (retrans) for

given ldquogoodputrdquo unneeded retransmissions

link carries multiple copies of pkt

always (goodput)

ldquoperfectrdquo retransmission only when

loss

retransmission of delayed (not lost)

packet makes larger (than

perfect case) for same

in

out

=

in

out

gt

in

out

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit when packet dropped any

ldquoupstream transmission capacity used for that packet was wasted

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems

Two broad approaches towards congestion control

TCP Congestion Control

end-end control (no network assistance)

sender limits transmission LastByteSent-LastByteAcked

CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

TCP Slow Start

When connection begins increase rate exponentially until first loss event double CongWin every

RTT done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Refinement

After 3 dup ACKs CongWin is cut in half window then grows linearly (congestion avoidance)

But after timeout event CongWin instead set to 1 MSS window then grows exponentially (slow start) to a threshold then grows linearly (congestion

avoidance)

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

TCP Futures

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed needed

LRTT

MSS221

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughput increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Chapter 3 Summary principles behind transport

layer services multiplexing demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Chapter 3 Transport Layer
  • Transport services and protocols
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demux (cont)
  • Connection-oriented demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Principles of Reliable data transfer
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 has a fatal flaw
  • rdt21
  • rdt22 a NAK-free protocol
  • rdt30 channels with errors and loss
  • Pipelined protocols
  • Go-Back-N
  • Slide 19
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • Example RTT estimation
  • Slide 28
  • TCP reliable data transfer
  • TCP sender events
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • TCP Flow Control
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Approaches towards congestion control
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • Refinement
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures
  • TCP Fairness
  • Why is TCP fair
  • Chapter 3 Summary
Page 6: Chapter 3: Transport Layer Our goals: r understand principles behind transport layer services: m multiplexing/demultiple xing m reliable data transfer.

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

UDP User Datagram Protocol [RFC 768]

ldquono frillsrdquo ldquobare bonesrdquo Internet transport protocol

ldquobest effortrdquo service UDP segments may be lost delivered out of order to

app connectionless

no handshaking between UDP sender receiver

each UDP segment handled independently of others

Why is there a UDP no connection establishment

(which can add delay) simple no connection state

at sender receiver small segment header no congestion control UDP

can blast away as fast as desired

UDP more

often used for streaming multimedia apps loss tolerant rate sensitive

other UDP uses DNS SNMP

reliable transfer over UDP add reliability at application layer application-specific error

recovery

source port dest port

32 bits

Applicationdata

(message)

UDP segment format

length checksumLength in

bytes of UDPsegmentincluding

header

UDP checksum

Sender treat segment contents as

sequence of 16-bit integers checksum addition (1rsquos

complement sum) of segment contents

sender puts checksum value into UDP checksum field

Receiver compute checksum of received

segment check if computed checksum

equals checksum field value NO - error detected YES - no error detected

But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

Principles of Reliable data transfer important in app transport link layers characteristics of unreliable channel will determine

complexity of reliable data transfer protocol (rdt)

Rdt10 reliable transfer over a reliable channel

underlying channel perfectly reliable no bit errors no loss of packets

Rdt20 channel with bit errors

underlying channel may flip bits in packet checksum to detect bit errors

the question how to recover from errors acknowledgements (ACKs) negative acknowledgements (NAKs) sender retransmits pkt on receipt of NAK

new mechanisms in rdt20 error detection receiver feedback

rdt20 has a fatal flaw

What happens if ACKNAK corrupted

sender doesnrsquot know what happened at receiver

canrsquot just retransmit possible duplicate

Handling duplicates sender adds sequence

number to each pkt sender retransmits current

pkt if ACKNAK garbled receiver discards (doesnrsquot

deliver up) duplicate pkt

Sender sends one packet then waits for receiver response

stop and wait

rdt21

Sender seq added to pkt two seq rsquos (01) will

suffice must check if received

ACKNAK corrupted

Receiver must check if received

packet is duplicate note receiver can not

know if its last ACKNAK received OK at sender

rdt22 a NAK-free protocol

same functionality as rdt21 using ACKs only instead of NAK receiver sends ACK for last pkt

received OK duplicate ACK at sender results in same action as NAK

retransmit current pkt

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs) checksum seq ACKs

retransmissions will be of help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

retransmits if no ACK received in this time

if pkt (or ACK) just delayed (not lost) retransmission will be

duplicate but use of seq rsquos already handles this

receiver must specify seq of pkt being ACKed

requires countdown timer

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

Receiver ACK-only always send ACK for correctly-received

pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Go-Back-N

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery to

upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed pkt

advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1]

send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data connection-oriented flow controlled

point-to-point reliable in-order byte

steam pipelined send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next byte

expected from other side

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

longer than RTT too short premature

timeout too long slow reaction to

segment loss

Q how to estimate RTT SampleRTT measured time from

segment transmission until ACK receipt

SampleRTT will vary want estimated RTT ldquosmootherrdquo

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP reliable data transfer

TCP creates reliable data transfer service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks Retransmissions are triggered by

timeout events duplicate acks

TCP sender eventsdata rcvd from app Create segment with

seq seq is byte-stream

number of first data byte in segment

start timer if not already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment that

caused timeout restart timer

Ack rcvd If acknowledges

previously unacked segments update what is known to

be acked start timer if there are

outstanding segments

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment startsat lower end of gap

Fast Retransmit Time-out period often relatively long Detect lost segments via duplicate ACKs If sender receives 3 ACKs for the same data it supposes that

segment after ACKed data was lost

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

fast retransmita duplicate ACK for already ACKed segment

TCP Flow Control

receiver

speed-matching service matching the send rate to the receiving apprsquos drain rate

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow

TCP Connection Management

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much data

too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of

lost packet finite shared output link buffers

Host Ain original data

Host B

out

in original data plus retransmitted data

ldquocostsrdquo of congestion more work (retrans) for

given ldquogoodputrdquo unneeded retransmissions

link carries multiple copies of pkt

always (goodput)

ldquoperfectrdquo retransmission only when

loss

retransmission of delayed (not lost)

packet makes larger (than

perfect case) for same

in

out

=

in

out

gt

in

out

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit when packet dropped any

ldquoupstream transmission capacity used for that packet was wasted

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems

Two broad approaches towards congestion control

TCP Congestion Control

end-end control (no network assistance)

sender limits transmission LastByteSent-LastByteAcked

CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

TCP Slow Start

When connection begins increase rate exponentially until first loss event double CongWin every

RTT done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Refinement

After 3 dup ACKs CongWin is cut in half window then grows linearly (congestion avoidance)

But after timeout event CongWin instead set to 1 MSS window then grows exponentially (slow start) to a threshold then grows linearly (congestion

avoidance)

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

TCP Futures

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed needed

LRTT

MSS221

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughput increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Chapter 3 Summary principles behind transport

layer services multiplexing demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Chapter 3 Transport Layer
  • Transport services and protocols
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demux (cont)
  • Connection-oriented demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Principles of Reliable data transfer
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 has a fatal flaw
  • rdt21
  • rdt22 a NAK-free protocol
  • rdt30 channels with errors and loss
  • Pipelined protocols
  • Go-Back-N
  • Slide 19
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • Example RTT estimation
  • Slide 28
  • TCP reliable data transfer
  • TCP sender events
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • TCP Flow Control
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Approaches towards congestion control
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • Refinement
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures
  • TCP Fairness
  • Why is TCP fair
  • Chapter 3 Summary
Page 7: Chapter 3: Transport Layer Our goals: r understand principles behind transport layer services: m multiplexing/demultiple xing m reliable data transfer.

UDP User Datagram Protocol [RFC 768]

ldquono frillsrdquo ldquobare bonesrdquo Internet transport protocol

ldquobest effortrdquo service UDP segments may be lost delivered out of order to

app connectionless

no handshaking between UDP sender receiver

each UDP segment handled independently of others

Why is there a UDP no connection establishment

(which can add delay) simple no connection state

at sender receiver small segment header no congestion control UDP

can blast away as fast as desired

UDP more

often used for streaming multimedia apps loss tolerant rate sensitive

other UDP uses DNS SNMP

reliable transfer over UDP add reliability at application layer application-specific error

recovery

source port dest port

32 bits

Applicationdata

(message)

UDP segment format

length checksumLength in

bytes of UDPsegmentincluding

header

UDP checksum

Sender treat segment contents as

sequence of 16-bit integers checksum addition (1rsquos

complement sum) of segment contents

sender puts checksum value into UDP checksum field

Receiver compute checksum of received

segment check if computed checksum

equals checksum field value NO - error detected YES - no error detected

But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

Principles of Reliable data transfer important in app transport link layers characteristics of unreliable channel will determine

complexity of reliable data transfer protocol (rdt)

Rdt10 reliable transfer over a reliable channel

underlying channel perfectly reliable no bit errors no loss of packets

Rdt20 channel with bit errors

underlying channel may flip bits in packet checksum to detect bit errors

the question how to recover from errors acknowledgements (ACKs) negative acknowledgements (NAKs) sender retransmits pkt on receipt of NAK

new mechanisms in rdt20 error detection receiver feedback

rdt20 has a fatal flaw

What happens if ACKNAK corrupted

sender doesnrsquot know what happened at receiver

canrsquot just retransmit possible duplicate

Handling duplicates sender adds sequence

number to each pkt sender retransmits current

pkt if ACKNAK garbled receiver discards (doesnrsquot

deliver up) duplicate pkt

Sender sends one packet then waits for receiver response

stop and wait

rdt21

Sender seq added to pkt two seq rsquos (01) will

suffice must check if received

ACKNAK corrupted

Receiver must check if received

packet is duplicate note receiver can not

know if its last ACKNAK received OK at sender

rdt22 a NAK-free protocol

same functionality as rdt21 using ACKs only instead of NAK receiver sends ACK for last pkt

received OK duplicate ACK at sender results in same action as NAK

retransmit current pkt

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs) checksum seq ACKs

retransmissions will be of help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

retransmits if no ACK received in this time

if pkt (or ACK) just delayed (not lost) retransmission will be

duplicate but use of seq rsquos already handles this

receiver must specify seq of pkt being ACKed

requires countdown timer

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

Receiver ACK-only always send ACK for correctly-received

pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Go-Back-N

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery to

upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed pkt

advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1]

send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data connection-oriented flow controlled

point-to-point reliable in-order byte

steam pipelined send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next byte

expected from other side

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

longer than RTT too short premature

timeout too long slow reaction to

segment loss

Q how to estimate RTT SampleRTT measured time from

segment transmission until ACK receipt

SampleRTT will vary want estimated RTT ldquosmootherrdquo

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP reliable data transfer

TCP creates reliable data transfer service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks Retransmissions are triggered by

timeout events duplicate acks

TCP sender eventsdata rcvd from app Create segment with

seq seq is byte-stream

number of first data byte in segment

start timer if not already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment that

caused timeout restart timer

Ack rcvd If acknowledges

previously unacked segments update what is known to

be acked start timer if there are

outstanding segments

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment startsat lower end of gap

Fast Retransmit Time-out period often relatively long Detect lost segments via duplicate ACKs If sender receives 3 ACKs for the same data it supposes that

segment after ACKed data was lost

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

fast retransmita duplicate ACK for already ACKed segment

TCP Flow Control

receiver

speed-matching service matching the send rate to the receiving apprsquos drain rate

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow

TCP Connection Management

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much data

too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of

lost packet finite shared output link buffers

Host Ain original data

Host B

out

in original data plus retransmitted data

ldquocostsrdquo of congestion more work (retrans) for

given ldquogoodputrdquo unneeded retransmissions

link carries multiple copies of pkt

always (goodput)

ldquoperfectrdquo retransmission only when

loss

retransmission of delayed (not lost)

packet makes larger (than

perfect case) for same

in

out

=

in

out

gt

in

out

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit when packet dropped any

ldquoupstream transmission capacity used for that packet was wasted

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems

Two broad approaches towards congestion control

TCP Congestion Control

end-end control (no network assistance)

sender limits transmission LastByteSent-LastByteAcked

CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

TCP Slow Start

When connection begins increase rate exponentially until first loss event double CongWin every

RTT done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Refinement

After 3 dup ACKs CongWin is cut in half window then grows linearly (congestion avoidance)

But after timeout event CongWin instead set to 1 MSS window then grows exponentially (slow start) to a threshold then grows linearly (congestion

avoidance)

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

TCP Futures

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed needed

LRTT

MSS221

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughput increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Chapter 3 Summary principles behind transport

layer services multiplexing demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Chapter 3 Transport Layer
  • Transport services and protocols
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demux (cont)
  • Connection-oriented demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Principles of Reliable data transfer
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 has a fatal flaw
  • rdt21
  • rdt22 a NAK-free protocol
  • rdt30 channels with errors and loss
  • Pipelined protocols
  • Go-Back-N
  • Slide 19
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • Example RTT estimation
  • Slide 28
  • TCP reliable data transfer
  • TCP sender events
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • TCP Flow Control
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Approaches towards congestion control
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • Refinement
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures
  • TCP Fairness
  • Why is TCP fair
  • Chapter 3 Summary
Page 8: Chapter 3: Transport Layer Our goals: r understand principles behind transport layer services: m multiplexing/demultiple xing m reliable data transfer.

UDP more

often used for streaming multimedia apps loss tolerant rate sensitive

other UDP uses DNS SNMP

reliable transfer over UDP add reliability at application layer application-specific error

recovery

source port dest port

32 bits

Applicationdata

(message)

UDP segment format

length checksumLength in

bytes of UDPsegmentincluding

header

UDP checksum

Sender treat segment contents as

sequence of 16-bit integers checksum addition (1rsquos

complement sum) of segment contents

sender puts checksum value into UDP checksum field

Receiver compute checksum of received

segment check if computed checksum

equals checksum field value NO - error detected YES - no error detected

But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

Principles of Reliable data transfer important in app transport link layers characteristics of unreliable channel will determine

complexity of reliable data transfer protocol (rdt)

Rdt10 reliable transfer over a reliable channel

underlying channel perfectly reliable no bit errors no loss of packets

Rdt20 channel with bit errors

underlying channel may flip bits in packet checksum to detect bit errors

the question how to recover from errors acknowledgements (ACKs) negative acknowledgements (NAKs) sender retransmits pkt on receipt of NAK

new mechanisms in rdt20 error detection receiver feedback

rdt20 has a fatal flaw

What happens if ACKNAK corrupted

sender doesnrsquot know what happened at receiver

canrsquot just retransmit possible duplicate

Handling duplicates sender adds sequence

number to each pkt sender retransmits current

pkt if ACKNAK garbled receiver discards (doesnrsquot

deliver up) duplicate pkt

Sender sends one packet then waits for receiver response

stop and wait

rdt21

Sender seq added to pkt two seq rsquos (01) will

suffice must check if received

ACKNAK corrupted

Receiver must check if received

packet is duplicate note receiver can not

know if its last ACKNAK received OK at sender

rdt22 a NAK-free protocol

same functionality as rdt21 using ACKs only instead of NAK receiver sends ACK for last pkt

received OK duplicate ACK at sender results in same action as NAK

retransmit current pkt

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs) checksum seq ACKs

retransmissions will be of help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

retransmits if no ACK received in this time

if pkt (or ACK) just delayed (not lost) retransmission will be

duplicate but use of seq rsquos already handles this

receiver must specify seq of pkt being ACKed

requires countdown timer

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

Receiver ACK-only always send ACK for correctly-received

pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Go-Back-N

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery to

upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed pkt

advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1]

send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data connection-oriented flow controlled

point-to-point reliable in-order byte

steam pipelined send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next byte

expected from other side

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

longer than RTT too short premature

timeout too long slow reaction to

segment loss

Q how to estimate RTT SampleRTT measured time from

segment transmission until ACK receipt

SampleRTT will vary want estimated RTT ldquosmootherrdquo

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP reliable data transfer

TCP creates reliable data transfer service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks Retransmissions are triggered by

timeout events duplicate acks

TCP sender eventsdata rcvd from app Create segment with

seq seq is byte-stream

number of first data byte in segment

start timer if not already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment that

caused timeout restart timer

Ack rcvd If acknowledges

previously unacked segments update what is known to

be acked start timer if there are

outstanding segments

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment startsat lower end of gap

Fast Retransmit Time-out period often relatively long Detect lost segments via duplicate ACKs If sender receives 3 ACKs for the same data it supposes that

segment after ACKed data was lost

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

fast retransmita duplicate ACK for already ACKed segment

TCP Flow Control

receiver

speed-matching service matching the send rate to the receiving apprsquos drain rate

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow

TCP Connection Management

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much data

too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of

lost packet finite shared output link buffers

Host Ain original data

Host B

out

in original data plus retransmitted data

ldquocostsrdquo of congestion more work (retrans) for

given ldquogoodputrdquo unneeded retransmissions

link carries multiple copies of pkt

always (goodput)

ldquoperfectrdquo retransmission only when

loss

retransmission of delayed (not lost)

packet makes larger (than

perfect case) for same

in

out

=

in

out

gt

in

out

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit when packet dropped any

ldquoupstream transmission capacity used for that packet was wasted

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems

Two broad approaches towards congestion control

TCP Congestion Control

end-end control (no network assistance)

sender limits transmission LastByteSent-LastByteAcked

CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

TCP Slow Start

When connection begins increase rate exponentially until first loss event double CongWin every

RTT done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Refinement

After 3 dup ACKs CongWin is cut in half window then grows linearly (congestion avoidance)

But after timeout event CongWin instead set to 1 MSS window then grows exponentially (slow start) to a threshold then grows linearly (congestion

avoidance)

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

TCP Futures

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed needed

LRTT

MSS221

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughput increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Chapter 3 Summary principles behind transport

layer services multiplexing demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Chapter 3 Transport Layer
  • Transport services and protocols
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demux (cont)
  • Connection-oriented demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Principles of Reliable data transfer
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 has a fatal flaw
  • rdt21
  • rdt22 a NAK-free protocol
  • rdt30 channels with errors and loss
  • Pipelined protocols
  • Go-Back-N
  • Slide 19
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • Example RTT estimation
  • Slide 28
  • TCP reliable data transfer
  • TCP sender events
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • TCP Flow Control
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Approaches towards congestion control
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • Refinement
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures
  • TCP Fairness
  • Why is TCP fair
  • Chapter 3 Summary
Page 9: Chapter 3: Transport Layer Our goals: r understand principles behind transport layer services: m multiplexing/demultiple xing m reliable data transfer.

UDP checksum

Sender treat segment contents as

sequence of 16-bit integers checksum addition (1rsquos

complement sum) of segment contents

sender puts checksum value into UDP checksum field

Receiver compute checksum of received

segment check if computed checksum

equals checksum field value NO - error detected YES - no error detected

But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

Principles of Reliable data transfer important in app transport link layers characteristics of unreliable channel will determine

complexity of reliable data transfer protocol (rdt)

Rdt10 reliable transfer over a reliable channel

underlying channel perfectly reliable no bit errors no loss of packets

Rdt20 channel with bit errors

underlying channel may flip bits in packet checksum to detect bit errors

the question how to recover from errors acknowledgements (ACKs) negative acknowledgements (NAKs) sender retransmits pkt on receipt of NAK

new mechanisms in rdt20 error detection receiver feedback

rdt20 has a fatal flaw

What happens if ACKNAK corrupted

sender doesnrsquot know what happened at receiver

canrsquot just retransmit possible duplicate

Handling duplicates sender adds sequence

number to each pkt sender retransmits current

pkt if ACKNAK garbled receiver discards (doesnrsquot

deliver up) duplicate pkt

Sender sends one packet then waits for receiver response

stop and wait

rdt21

Sender seq added to pkt two seq rsquos (01) will

suffice must check if received

ACKNAK corrupted

Receiver must check if received

packet is duplicate note receiver can not

know if its last ACKNAK received OK at sender

rdt22 a NAK-free protocol

same functionality as rdt21 using ACKs only instead of NAK receiver sends ACK for last pkt

received OK duplicate ACK at sender results in same action as NAK

retransmit current pkt

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs) checksum seq ACKs

retransmissions will be of help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

retransmits if no ACK received in this time

if pkt (or ACK) just delayed (not lost) retransmission will be

duplicate but use of seq rsquos already handles this

receiver must specify seq of pkt being ACKed

requires countdown timer

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

Receiver ACK-only always send ACK for correctly-received

pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Go-Back-N

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery to

upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed pkt

advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1]

send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data connection-oriented flow controlled

point-to-point reliable in-order byte

steam pipelined send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next byte

expected from other side

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

longer than RTT too short premature

timeout too long slow reaction to

segment loss

Q how to estimate RTT SampleRTT measured time from

segment transmission until ACK receipt

SampleRTT will vary want estimated RTT ldquosmootherrdquo

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP reliable data transfer

TCP creates reliable data transfer service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks Retransmissions are triggered by

timeout events duplicate acks

TCP sender eventsdata rcvd from app Create segment with

seq seq is byte-stream

number of first data byte in segment

start timer if not already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment that

caused timeout restart timer

Ack rcvd If acknowledges

previously unacked segments update what is known to

be acked start timer if there are

outstanding segments

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment startsat lower end of gap

Fast Retransmit Time-out period often relatively long Detect lost segments via duplicate ACKs If sender receives 3 ACKs for the same data it supposes that

segment after ACKed data was lost

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

fast retransmita duplicate ACK for already ACKed segment

TCP Flow Control

receiver

speed-matching service matching the send rate to the receiving apprsquos drain rate

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow

TCP Connection Management

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much data

too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of

lost packet finite shared output link buffers

Host Ain original data

Host B

out

in original data plus retransmitted data

ldquocostsrdquo of congestion more work (retrans) for

given ldquogoodputrdquo unneeded retransmissions

link carries multiple copies of pkt

always (goodput)

ldquoperfectrdquo retransmission only when

loss

retransmission of delayed (not lost)

packet makes larger (than

perfect case) for same

in

out

=

in

out

gt

in

out

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit when packet dropped any

ldquoupstream transmission capacity used for that packet was wasted

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems

Two broad approaches towards congestion control

TCP Congestion Control

end-end control (no network assistance)

sender limits transmission LastByteSent-LastByteAcked

CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

TCP Slow Start

When connection begins increase rate exponentially until first loss event double CongWin every

RTT done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Refinement

After 3 dup ACKs CongWin is cut in half window then grows linearly (congestion avoidance)

But after timeout event CongWin instead set to 1 MSS window then grows exponentially (slow start) to a threshold then grows linearly (congestion

avoidance)

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

TCP Futures

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed needed

LRTT

MSS221

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughput increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Chapter 3 Summary principles behind transport

layer services multiplexing demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Chapter 3 Transport Layer
  • Transport services and protocols
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demux (cont)
  • Connection-oriented demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Principles of Reliable data transfer
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 has a fatal flaw
  • rdt21
  • rdt22 a NAK-free protocol
  • rdt30 channels with errors and loss
  • Pipelined protocols
  • Go-Back-N
  • Slide 19
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • Example RTT estimation
  • Slide 28
  • TCP reliable data transfer
  • TCP sender events
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • TCP Flow Control
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Approaches towards congestion control
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • Refinement
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures
  • TCP Fairness
  • Why is TCP fair
  • Chapter 3 Summary
Page 10: Chapter 3: Transport Layer Our goals: r understand principles behind transport layer services: m multiplexing/demultiple xing m reliable data transfer.

Principles of Reliable data transfer important in app transport link layers characteristics of unreliable channel will determine

complexity of reliable data transfer protocol (rdt)

Rdt10 reliable transfer over a reliable channel

underlying channel perfectly reliable no bit errors no loss of packets

Rdt20 channel with bit errors

underlying channel may flip bits in packet checksum to detect bit errors

the question how to recover from errors acknowledgements (ACKs) negative acknowledgements (NAKs) sender retransmits pkt on receipt of NAK

new mechanisms in rdt20 error detection receiver feedback

rdt20 has a fatal flaw

What happens if ACKNAK corrupted

sender doesnrsquot know what happened at receiver

canrsquot just retransmit possible duplicate

Handling duplicates sender adds sequence

number to each pkt sender retransmits current

pkt if ACKNAK garbled receiver discards (doesnrsquot

deliver up) duplicate pkt

Sender sends one packet then waits for receiver response

stop and wait

rdt21

Sender seq added to pkt two seq rsquos (01) will

suffice must check if received

ACKNAK corrupted

Receiver must check if received

packet is duplicate note receiver can not

know if its last ACKNAK received OK at sender

rdt22 a NAK-free protocol

same functionality as rdt21 using ACKs only instead of NAK receiver sends ACK for last pkt

received OK duplicate ACK at sender results in same action as NAK

retransmit current pkt

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs) checksum seq ACKs

retransmissions will be of help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

retransmits if no ACK received in this time

if pkt (or ACK) just delayed (not lost) retransmission will be

duplicate but use of seq rsquos already handles this

receiver must specify seq of pkt being ACKed

requires countdown timer

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

Receiver ACK-only always send ACK for correctly-received

pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Go-Back-N

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery to

upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed pkt

advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1]

send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data connection-oriented flow controlled

point-to-point reliable in-order byte

steam pipelined send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next byte

expected from other side

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

longer than RTT too short premature

timeout too long slow reaction to

segment loss

Q how to estimate RTT SampleRTT measured time from

segment transmission until ACK receipt

SampleRTT will vary want estimated RTT ldquosmootherrdquo

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP reliable data transfer

TCP creates reliable data transfer service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks Retransmissions are triggered by

timeout events duplicate acks

TCP sender eventsdata rcvd from app Create segment with

seq seq is byte-stream

number of first data byte in segment

start timer if not already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment that

caused timeout restart timer

Ack rcvd If acknowledges

previously unacked segments update what is known to

be acked start timer if there are

outstanding segments

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment startsat lower end of gap

Fast Retransmit Time-out period often relatively long Detect lost segments via duplicate ACKs If sender receives 3 ACKs for the same data it supposes that

segment after ACKed data was lost

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

fast retransmita duplicate ACK for already ACKed segment

TCP Flow Control

receiver

speed-matching service matching the send rate to the receiving apprsquos drain rate

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow

TCP Connection Management

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much data

too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of

lost packet finite shared output link buffers

Host Ain original data

Host B

out

in original data plus retransmitted data

ldquocostsrdquo of congestion more work (retrans) for

given ldquogoodputrdquo unneeded retransmissions

link carries multiple copies of pkt

always (goodput)

ldquoperfectrdquo retransmission only when

loss

retransmission of delayed (not lost)

packet makes larger (than

perfect case) for same

in

out

=

in

out

gt

in

out

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit when packet dropped any

ldquoupstream transmission capacity used for that packet was wasted

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems

Two broad approaches towards congestion control

TCP Congestion Control

end-end control (no network assistance)

sender limits transmission LastByteSent-LastByteAcked

CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

TCP Slow Start

When connection begins increase rate exponentially until first loss event double CongWin every

RTT done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Refinement

After 3 dup ACKs CongWin is cut in half window then grows linearly (congestion avoidance)

But after timeout event CongWin instead set to 1 MSS window then grows exponentially (slow start) to a threshold then grows linearly (congestion

avoidance)

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

TCP Futures

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed needed

LRTT

MSS221

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughput increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Chapter 3 Summary principles behind transport

layer services multiplexing demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Chapter 3 Transport Layer
  • Transport services and protocols
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demux (cont)
  • Connection-oriented demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Principles of Reliable data transfer
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 has a fatal flaw
  • rdt21
  • rdt22 a NAK-free protocol
  • rdt30 channels with errors and loss
  • Pipelined protocols
  • Go-Back-N
  • Slide 19
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • Example RTT estimation
  • Slide 28
  • TCP reliable data transfer
  • TCP sender events
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • TCP Flow Control
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Approaches towards congestion control
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • Refinement
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures
  • TCP Fairness
  • Why is TCP fair
  • Chapter 3 Summary
Page 11: Chapter 3: Transport Layer Our goals: r understand principles behind transport layer services: m multiplexing/demultiple xing m reliable data transfer.

Rdt10 reliable transfer over a reliable channel

underlying channel perfectly reliable no bit errors no loss of packets

Rdt20 channel with bit errors

underlying channel may flip bits in packet checksum to detect bit errors

the question how to recover from errors acknowledgements (ACKs) negative acknowledgements (NAKs) sender retransmits pkt on receipt of NAK

new mechanisms in rdt20 error detection receiver feedback

rdt20 has a fatal flaw

What happens if ACKNAK corrupted

sender doesnrsquot know what happened at receiver

canrsquot just retransmit possible duplicate

Handling duplicates sender adds sequence

number to each pkt sender retransmits current

pkt if ACKNAK garbled receiver discards (doesnrsquot

deliver up) duplicate pkt

Sender sends one packet then waits for receiver response

stop and wait

rdt21

Sender seq added to pkt two seq rsquos (01) will

suffice must check if received

ACKNAK corrupted

Receiver must check if received

packet is duplicate note receiver can not

know if its last ACKNAK received OK at sender

rdt22 a NAK-free protocol

same functionality as rdt21 using ACKs only instead of NAK receiver sends ACK for last pkt

received OK duplicate ACK at sender results in same action as NAK

retransmit current pkt

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs) checksum seq ACKs

retransmissions will be of help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

retransmits if no ACK received in this time

if pkt (or ACK) just delayed (not lost) retransmission will be

duplicate but use of seq rsquos already handles this

receiver must specify seq of pkt being ACKed

requires countdown timer

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

Receiver ACK-only always send ACK for correctly-received

pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Go-Back-N

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery to

upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed pkt

advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1]

send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data connection-oriented flow controlled

point-to-point reliable in-order byte

steam pipelined send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next byte

expected from other side

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

longer than RTT too short premature

timeout too long slow reaction to

segment loss

Q how to estimate RTT SampleRTT measured time from

segment transmission until ACK receipt

SampleRTT will vary want estimated RTT ldquosmootherrdquo

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP reliable data transfer

TCP creates reliable data transfer service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks Retransmissions are triggered by

timeout events duplicate acks

TCP sender eventsdata rcvd from app Create segment with

seq seq is byte-stream

number of first data byte in segment

start timer if not already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment that

caused timeout restart timer

Ack rcvd If acknowledges

previously unacked segments update what is known to

be acked start timer if there are

outstanding segments

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment startsat lower end of gap

Fast Retransmit Time-out period often relatively long Detect lost segments via duplicate ACKs If sender receives 3 ACKs for the same data it supposes that

segment after ACKed data was lost

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

fast retransmita duplicate ACK for already ACKed segment

TCP Flow Control

receiver

speed-matching service matching the send rate to the receiving apprsquos drain rate

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow

TCP Connection Management

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much data

too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of

lost packet finite shared output link buffers

Host Ain original data

Host B

out

in original data plus retransmitted data

ldquocostsrdquo of congestion more work (retrans) for

given ldquogoodputrdquo unneeded retransmissions

link carries multiple copies of pkt

always (goodput)

ldquoperfectrdquo retransmission only when

loss

retransmission of delayed (not lost)

packet makes larger (than

perfect case) for same

in

out

=

in

out

gt

in

out

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit when packet dropped any

ldquoupstream transmission capacity used for that packet was wasted

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems

Two broad approaches towards congestion control

TCP Congestion Control

end-end control (no network assistance)

sender limits transmission LastByteSent-LastByteAcked

CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

TCP Slow Start

When connection begins increase rate exponentially until first loss event double CongWin every

RTT done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Refinement

After 3 dup ACKs CongWin is cut in half window then grows linearly (congestion avoidance)

But after timeout event CongWin instead set to 1 MSS window then grows exponentially (slow start) to a threshold then grows linearly (congestion

avoidance)

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

TCP Futures

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed needed

LRTT

MSS221

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughput increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Chapter 3 Summary principles behind transport

layer services multiplexing demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Chapter 3 Transport Layer
  • Transport services and protocols
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demux (cont)
  • Connection-oriented demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Principles of Reliable data transfer
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 has a fatal flaw
  • rdt21
  • rdt22 a NAK-free protocol
  • rdt30 channels with errors and loss
  • Pipelined protocols
  • Go-Back-N
  • Slide 19
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • Example RTT estimation
  • Slide 28
  • TCP reliable data transfer
  • TCP sender events
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • TCP Flow Control
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Approaches towards congestion control
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • Refinement
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures
  • TCP Fairness
  • Why is TCP fair
  • Chapter 3 Summary
Page 12: Chapter 3: Transport Layer Our goals: r understand principles behind transport layer services: m multiplexing/demultiple xing m reliable data transfer.

Rdt20 channel with bit errors

underlying channel may flip bits in packet checksum to detect bit errors

the question how to recover from errors acknowledgements (ACKs) negative acknowledgements (NAKs) sender retransmits pkt on receipt of NAK

new mechanisms in rdt20 error detection receiver feedback

rdt20 has a fatal flaw

What happens if ACKNAK corrupted

sender doesnrsquot know what happened at receiver

canrsquot just retransmit possible duplicate

Handling duplicates sender adds sequence

number to each pkt sender retransmits current

pkt if ACKNAK garbled receiver discards (doesnrsquot

deliver up) duplicate pkt

Sender sends one packet then waits for receiver response

stop and wait

rdt21

Sender seq added to pkt two seq rsquos (01) will

suffice must check if received

ACKNAK corrupted

Receiver must check if received

packet is duplicate note receiver can not

know if its last ACKNAK received OK at sender

rdt22 a NAK-free protocol

same functionality as rdt21 using ACKs only instead of NAK receiver sends ACK for last pkt

received OK duplicate ACK at sender results in same action as NAK

retransmit current pkt

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs) checksum seq ACKs

retransmissions will be of help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

retransmits if no ACK received in this time

if pkt (or ACK) just delayed (not lost) retransmission will be

duplicate but use of seq rsquos already handles this

receiver must specify seq of pkt being ACKed

requires countdown timer

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

Receiver ACK-only always send ACK for correctly-received

pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Go-Back-N

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery to

upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed pkt

advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1]

send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data connection-oriented flow controlled

point-to-point reliable in-order byte

steam pipelined send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next byte

expected from other side

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

longer than RTT too short premature

timeout too long slow reaction to

segment loss

Q how to estimate RTT SampleRTT measured time from

segment transmission until ACK receipt

SampleRTT will vary want estimated RTT ldquosmootherrdquo

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP reliable data transfer

TCP creates reliable data transfer service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks Retransmissions are triggered by

timeout events duplicate acks

TCP sender eventsdata rcvd from app Create segment with

seq seq is byte-stream

number of first data byte in segment

start timer if not already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment that

caused timeout restart timer

Ack rcvd If acknowledges

previously unacked segments update what is known to

be acked start timer if there are

outstanding segments

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment startsat lower end of gap

Fast Retransmit Time-out period often relatively long Detect lost segments via duplicate ACKs If sender receives 3 ACKs for the same data it supposes that

segment after ACKed data was lost

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

fast retransmita duplicate ACK for already ACKed segment

TCP Flow Control

receiver

speed-matching service matching the send rate to the receiving apprsquos drain rate

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow

TCP Connection Management

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much data

too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of

lost packet finite shared output link buffers

Host Ain original data

Host B

out

in original data plus retransmitted data

ldquocostsrdquo of congestion more work (retrans) for

given ldquogoodputrdquo unneeded retransmissions

link carries multiple copies of pkt

always (goodput)

ldquoperfectrdquo retransmission only when

loss

retransmission of delayed (not lost)

packet makes larger (than

perfect case) for same

in

out

=

in

out

gt

in

out

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit when packet dropped any

ldquoupstream transmission capacity used for that packet was wasted

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems

Two broad approaches towards congestion control

TCP Congestion Control

end-end control (no network assistance)

sender limits transmission LastByteSent-LastByteAcked

CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

TCP Slow Start

When connection begins increase rate exponentially until first loss event double CongWin every

RTT done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Refinement

After 3 dup ACKs CongWin is cut in half window then grows linearly (congestion avoidance)

But after timeout event CongWin instead set to 1 MSS window then grows exponentially (slow start) to a threshold then grows linearly (congestion

avoidance)

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

TCP Futures

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed needed

LRTT

MSS221

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughput increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Chapter 3 Summary principles behind transport

layer services multiplexing demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Chapter 3 Transport Layer
  • Transport services and protocols
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demux (cont)
  • Connection-oriented demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Principles of Reliable data transfer
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 has a fatal flaw
  • rdt21
  • rdt22 a NAK-free protocol
  • rdt30 channels with errors and loss
  • Pipelined protocols
  • Go-Back-N
  • Slide 19
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • Example RTT estimation
  • Slide 28
  • TCP reliable data transfer
  • TCP sender events
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • TCP Flow Control
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Approaches towards congestion control
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • Refinement
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures
  • TCP Fairness
  • Why is TCP fair
  • Chapter 3 Summary
Page 13: Chapter 3: Transport Layer Our goals: r understand principles behind transport layer services: m multiplexing/demultiple xing m reliable data transfer.

rdt20 has a fatal flaw

What happens if ACKNAK corrupted

sender doesnrsquot know what happened at receiver

canrsquot just retransmit possible duplicate

Handling duplicates sender adds sequence

number to each pkt sender retransmits current

pkt if ACKNAK garbled receiver discards (doesnrsquot

deliver up) duplicate pkt

Sender sends one packet then waits for receiver response

stop and wait

rdt21

Sender seq added to pkt two seq rsquos (01) will

suffice must check if received

ACKNAK corrupted

Receiver must check if received

packet is duplicate note receiver can not

know if its last ACKNAK received OK at sender

rdt22 a NAK-free protocol

same functionality as rdt21 using ACKs only instead of NAK receiver sends ACK for last pkt

received OK duplicate ACK at sender results in same action as NAK

retransmit current pkt

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs) checksum seq ACKs

retransmissions will be of help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

retransmits if no ACK received in this time

if pkt (or ACK) just delayed (not lost) retransmission will be

duplicate but use of seq rsquos already handles this

receiver must specify seq of pkt being ACKed

requires countdown timer

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

Receiver ACK-only always send ACK for correctly-received

pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Go-Back-N

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery to

upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed pkt

advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1]

send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data connection-oriented flow controlled

point-to-point reliable in-order byte

steam pipelined send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next byte

expected from other side

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

longer than RTT too short premature

timeout too long slow reaction to

segment loss

Q how to estimate RTT SampleRTT measured time from

segment transmission until ACK receipt

SampleRTT will vary want estimated RTT ldquosmootherrdquo

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP reliable data transfer

TCP creates reliable data transfer service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks Retransmissions are triggered by

timeout events duplicate acks

TCP sender eventsdata rcvd from app Create segment with

seq seq is byte-stream

number of first data byte in segment

start timer if not already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment that

caused timeout restart timer

Ack rcvd If acknowledges

previously unacked segments update what is known to

be acked start timer if there are

outstanding segments

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment startsat lower end of gap

Fast Retransmit Time-out period often relatively long Detect lost segments via duplicate ACKs If sender receives 3 ACKs for the same data it supposes that

segment after ACKed data was lost

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

fast retransmita duplicate ACK for already ACKed segment

TCP Flow Control

receiver

speed-matching service matching the send rate to the receiving apprsquos drain rate

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow

TCP Connection Management

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much data

too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of

lost packet finite shared output link buffers

Host Ain original data

Host B

out

in original data plus retransmitted data

ldquocostsrdquo of congestion more work (retrans) for

given ldquogoodputrdquo unneeded retransmissions

link carries multiple copies of pkt

always (goodput)

ldquoperfectrdquo retransmission only when

loss

retransmission of delayed (not lost)

packet makes larger (than

perfect case) for same

in

out

=

in

out

gt

in

out

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit when packet dropped any

ldquoupstream transmission capacity used for that packet was wasted

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems

Two broad approaches towards congestion control

TCP Congestion Control

end-end control (no network assistance)

sender limits transmission LastByteSent-LastByteAcked

CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

TCP Slow Start

When connection begins increase rate exponentially until first loss event double CongWin every

RTT done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Refinement

After 3 dup ACKs CongWin is cut in half window then grows linearly (congestion avoidance)

But after timeout event CongWin instead set to 1 MSS window then grows exponentially (slow start) to a threshold then grows linearly (congestion

avoidance)

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

TCP Futures

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed needed

LRTT

MSS221

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughput increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Chapter 3 Summary principles behind transport

layer services multiplexing demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Chapter 3 Transport Layer
  • Transport services and protocols
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demux (cont)
  • Connection-oriented demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Principles of Reliable data transfer
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 has a fatal flaw
  • rdt21
  • rdt22 a NAK-free protocol
  • rdt30 channels with errors and loss
  • Pipelined protocols
  • Go-Back-N
  • Slide 19
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • Example RTT estimation
  • Slide 28
  • TCP reliable data transfer
  • TCP sender events
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • TCP Flow Control
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Approaches towards congestion control
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • Refinement
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures
  • TCP Fairness
  • Why is TCP fair
  • Chapter 3 Summary
Page 14: Chapter 3: Transport Layer Our goals: r understand principles behind transport layer services: m multiplexing/demultiple xing m reliable data transfer.

rdt21

Sender seq added to pkt two seq rsquos (01) will

suffice must check if received

ACKNAK corrupted

Receiver must check if received

packet is duplicate note receiver can not

know if its last ACKNAK received OK at sender

rdt22 a NAK-free protocol

same functionality as rdt21 using ACKs only instead of NAK receiver sends ACK for last pkt

received OK duplicate ACK at sender results in same action as NAK

retransmit current pkt

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs) checksum seq ACKs

retransmissions will be of help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

retransmits if no ACK received in this time

if pkt (or ACK) just delayed (not lost) retransmission will be

duplicate but use of seq rsquos already handles this

receiver must specify seq of pkt being ACKed

requires countdown timer

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

Receiver ACK-only always send ACK for correctly-received

pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Go-Back-N

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery to

upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed pkt

advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1]

send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data connection-oriented flow controlled

point-to-point reliable in-order byte

steam pipelined send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next byte

expected from other side

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

longer than RTT too short premature

timeout too long slow reaction to

segment loss

Q how to estimate RTT SampleRTT measured time from

segment transmission until ACK receipt

SampleRTT will vary want estimated RTT ldquosmootherrdquo

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP reliable data transfer

TCP creates reliable data transfer service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks Retransmissions are triggered by

timeout events duplicate acks

TCP sender eventsdata rcvd from app Create segment with

seq seq is byte-stream

number of first data byte in segment

start timer if not already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment that

caused timeout restart timer

Ack rcvd If acknowledges

previously unacked segments update what is known to

be acked start timer if there are

outstanding segments

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment startsat lower end of gap

Fast Retransmit Time-out period often relatively long Detect lost segments via duplicate ACKs If sender receives 3 ACKs for the same data it supposes that

segment after ACKed data was lost

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

fast retransmita duplicate ACK for already ACKed segment

TCP Flow Control

receiver

speed-matching service matching the send rate to the receiving apprsquos drain rate

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow

TCP Connection Management

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much data

too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of

lost packet finite shared output link buffers

Host Ain original data

Host B

out

in original data plus retransmitted data

ldquocostsrdquo of congestion more work (retrans) for

given ldquogoodputrdquo unneeded retransmissions

link carries multiple copies of pkt

always (goodput)

ldquoperfectrdquo retransmission only when

loss

retransmission of delayed (not lost)

packet makes larger (than

perfect case) for same

in

out

=

in

out

gt

in

out

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit when packet dropped any

ldquoupstream transmission capacity used for that packet was wasted

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems

Two broad approaches towards congestion control

TCP Congestion Control

end-end control (no network assistance)

sender limits transmission LastByteSent-LastByteAcked

CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

TCP Slow Start

When connection begins increase rate exponentially until first loss event double CongWin every

RTT done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Refinement

After 3 dup ACKs CongWin is cut in half window then grows linearly (congestion avoidance)

But after timeout event CongWin instead set to 1 MSS window then grows exponentially (slow start) to a threshold then grows linearly (congestion

avoidance)

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

TCP Futures

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed needed

LRTT

MSS221

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughput increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Chapter 3 Summary principles behind transport

layer services multiplexing demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Chapter 3 Transport Layer
  • Transport services and protocols
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demux (cont)
  • Connection-oriented demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Principles of Reliable data transfer
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 has a fatal flaw
  • rdt21
  • rdt22 a NAK-free protocol
  • rdt30 channels with errors and loss
  • Pipelined protocols
  • Go-Back-N
  • Slide 19
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • Example RTT estimation
  • Slide 28
  • TCP reliable data transfer
  • TCP sender events
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • TCP Flow Control
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Approaches towards congestion control
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • Refinement
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures
  • TCP Fairness
  • Why is TCP fair
  • Chapter 3 Summary
Page 15: Chapter 3: Transport Layer Our goals: r understand principles behind transport layer services: m multiplexing/demultiple xing m reliable data transfer.

rdt22 a NAK-free protocol

same functionality as rdt21 using ACKs only instead of NAK receiver sends ACK for last pkt

received OK duplicate ACK at sender results in same action as NAK

retransmit current pkt

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs) checksum seq ACKs

retransmissions will be of help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

retransmits if no ACK received in this time

if pkt (or ACK) just delayed (not lost) retransmission will be

duplicate but use of seq rsquos already handles this

receiver must specify seq of pkt being ACKed

requires countdown timer

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

Receiver ACK-only always send ACK for correctly-received

pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Go-Back-N

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery to

upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed pkt

advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1]

send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data connection-oriented flow controlled

point-to-point reliable in-order byte

steam pipelined send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next byte

expected from other side

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

longer than RTT too short premature

timeout too long slow reaction to

segment loss

Q how to estimate RTT SampleRTT measured time from

segment transmission until ACK receipt

SampleRTT will vary want estimated RTT ldquosmootherrdquo

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP reliable data transfer

TCP creates reliable data transfer service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks Retransmissions are triggered by

timeout events duplicate acks

TCP sender eventsdata rcvd from app Create segment with

seq seq is byte-stream

number of first data byte in segment

start timer if not already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment that

caused timeout restart timer

Ack rcvd If acknowledges

previously unacked segments update what is known to

be acked start timer if there are

outstanding segments

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment startsat lower end of gap

Fast Retransmit Time-out period often relatively long Detect lost segments via duplicate ACKs If sender receives 3 ACKs for the same data it supposes that

segment after ACKed data was lost

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

fast retransmita duplicate ACK for already ACKed segment

TCP Flow Control

receiver

speed-matching service matching the send rate to the receiving apprsquos drain rate

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow

TCP Connection Management

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much data

too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of

lost packet finite shared output link buffers

Host Ain original data

Host B

out

in original data plus retransmitted data

ldquocostsrdquo of congestion more work (retrans) for

given ldquogoodputrdquo unneeded retransmissions

link carries multiple copies of pkt

always (goodput)

ldquoperfectrdquo retransmission only when

loss

retransmission of delayed (not lost)

packet makes larger (than

perfect case) for same

in

out

=

in

out

gt

in

out

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit when packet dropped any

ldquoupstream transmission capacity used for that packet was wasted

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems

Two broad approaches towards congestion control

TCP Congestion Control

end-end control (no network assistance)

sender limits transmission LastByteSent-LastByteAcked

CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

TCP Slow Start

When connection begins increase rate exponentially until first loss event double CongWin every

RTT done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Refinement

After 3 dup ACKs CongWin is cut in half window then grows linearly (congestion avoidance)

But after timeout event CongWin instead set to 1 MSS window then grows exponentially (slow start) to a threshold then grows linearly (congestion

avoidance)

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

TCP Futures

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed needed

LRTT

MSS221

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughput increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Chapter 3 Summary principles behind transport

layer services multiplexing demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Chapter 3 Transport Layer
  • Transport services and protocols
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demux (cont)
  • Connection-oriented demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Principles of Reliable data transfer
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 has a fatal flaw
  • rdt21
  • rdt22 a NAK-free protocol
  • rdt30 channels with errors and loss
  • Pipelined protocols
  • Go-Back-N
  • Slide 19
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • Example RTT estimation
  • Slide 28
  • TCP reliable data transfer
  • TCP sender events
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • TCP Flow Control
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Approaches towards congestion control
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • Refinement
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures
  • TCP Fairness
  • Why is TCP fair
  • Chapter 3 Summary
Page 16: Chapter 3: Transport Layer Our goals: r understand principles behind transport layer services: m multiplexing/demultiple xing m reliable data transfer.

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs) checksum seq ACKs

retransmissions will be of help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

retransmits if no ACK received in this time

if pkt (or ACK) just delayed (not lost) retransmission will be

duplicate but use of seq rsquos already handles this

receiver must specify seq of pkt being ACKed

requires countdown timer

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

Receiver ACK-only always send ACK for correctly-received

pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Go-Back-N

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery to

upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed pkt

advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1]

send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data connection-oriented flow controlled

point-to-point reliable in-order byte

steam pipelined send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next byte

expected from other side

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

longer than RTT too short premature

timeout too long slow reaction to

segment loss

Q how to estimate RTT SampleRTT measured time from

segment transmission until ACK receipt

SampleRTT will vary want estimated RTT ldquosmootherrdquo

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP reliable data transfer

TCP creates reliable data transfer service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks Retransmissions are triggered by

timeout events duplicate acks

TCP sender eventsdata rcvd from app Create segment with

seq seq is byte-stream

number of first data byte in segment

start timer if not already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment that

caused timeout restart timer

Ack rcvd If acknowledges

previously unacked segments update what is known to

be acked start timer if there are

outstanding segments

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment startsat lower end of gap

Fast Retransmit Time-out period often relatively long Detect lost segments via duplicate ACKs If sender receives 3 ACKs for the same data it supposes that

segment after ACKed data was lost

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

fast retransmita duplicate ACK for already ACKed segment

TCP Flow Control

receiver

speed-matching service matching the send rate to the receiving apprsquos drain rate

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow

TCP Connection Management

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much data

too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of

lost packet finite shared output link buffers

Host Ain original data

Host B

out

in original data plus retransmitted data

ldquocostsrdquo of congestion more work (retrans) for

given ldquogoodputrdquo unneeded retransmissions

link carries multiple copies of pkt

always (goodput)

ldquoperfectrdquo retransmission only when

loss

retransmission of delayed (not lost)

packet makes larger (than

perfect case) for same

in

out

=

in

out

gt

in

out

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit when packet dropped any

ldquoupstream transmission capacity used for that packet was wasted

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems

Two broad approaches towards congestion control

TCP Congestion Control

end-end control (no network assistance)

sender limits transmission LastByteSent-LastByteAcked

CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

TCP Slow Start

When connection begins increase rate exponentially until first loss event double CongWin every

RTT done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Refinement

After 3 dup ACKs CongWin is cut in half window then grows linearly (congestion avoidance)

But after timeout event CongWin instead set to 1 MSS window then grows exponentially (slow start) to a threshold then grows linearly (congestion

avoidance)

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

TCP Futures

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed needed

LRTT

MSS221

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughput increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Chapter 3 Summary principles behind transport

layer services multiplexing demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Chapter 3 Transport Layer
  • Transport services and protocols
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demux (cont)
  • Connection-oriented demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Principles of Reliable data transfer
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 has a fatal flaw
  • rdt21
  • rdt22 a NAK-free protocol
  • rdt30 channels with errors and loss
  • Pipelined protocols
  • Go-Back-N
  • Slide 19
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • Example RTT estimation
  • Slide 28
  • TCP reliable data transfer
  • TCP sender events
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • TCP Flow Control
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Approaches towards congestion control
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • Refinement
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures
  • TCP Fairness
  • Why is TCP fair
  • Chapter 3 Summary
Page 17: Chapter 3: Transport Layer Our goals: r understand principles behind transport layer services: m multiplexing/demultiple xing m reliable data transfer.

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

Receiver ACK-only always send ACK for correctly-received

pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Go-Back-N

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery to

upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed pkt

advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1]

send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data connection-oriented flow controlled

point-to-point reliable in-order byte

steam pipelined send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next byte

expected from other side

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

longer than RTT too short premature

timeout too long slow reaction to

segment loss

Q how to estimate RTT SampleRTT measured time from

segment transmission until ACK receipt

SampleRTT will vary want estimated RTT ldquosmootherrdquo

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP reliable data transfer

TCP creates reliable data transfer service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks Retransmissions are triggered by

timeout events duplicate acks

TCP sender eventsdata rcvd from app Create segment with

seq seq is byte-stream

number of first data byte in segment

start timer if not already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment that

caused timeout restart timer

Ack rcvd If acknowledges

previously unacked segments update what is known to

be acked start timer if there are

outstanding segments

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment startsat lower end of gap

Fast Retransmit Time-out period often relatively long Detect lost segments via duplicate ACKs If sender receives 3 ACKs for the same data it supposes that

segment after ACKed data was lost

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

fast retransmita duplicate ACK for already ACKed segment

TCP Flow Control

receiver

speed-matching service matching the send rate to the receiving apprsquos drain rate

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow

TCP Connection Management

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much data

too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of

lost packet finite shared output link buffers

Host Ain original data

Host B

out

in original data plus retransmitted data

ldquocostsrdquo of congestion more work (retrans) for

given ldquogoodputrdquo unneeded retransmissions

link carries multiple copies of pkt

always (goodput)

ldquoperfectrdquo retransmission only when

loss

retransmission of delayed (not lost)

packet makes larger (than

perfect case) for same

in

out

=

in

out

gt

in

out

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit when packet dropped any

ldquoupstream transmission capacity used for that packet was wasted

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems

Two broad approaches towards congestion control

TCP Congestion Control

end-end control (no network assistance)

sender limits transmission LastByteSent-LastByteAcked

CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

TCP Slow Start

When connection begins increase rate exponentially until first loss event double CongWin every

RTT done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Refinement

After 3 dup ACKs CongWin is cut in half window then grows linearly (congestion avoidance)

But after timeout event CongWin instead set to 1 MSS window then grows exponentially (slow start) to a threshold then grows linearly (congestion

avoidance)

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

TCP Futures

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed needed

LRTT

MSS221

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughput increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Chapter 3 Summary principles behind transport

layer services multiplexing demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Chapter 3 Transport Layer
  • Transport services and protocols
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demux (cont)
  • Connection-oriented demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Principles of Reliable data transfer
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 has a fatal flaw
  • rdt21
  • rdt22 a NAK-free protocol
  • rdt30 channels with errors and loss
  • Pipelined protocols
  • Go-Back-N
  • Slide 19
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • Example RTT estimation
  • Slide 28
  • TCP reliable data transfer
  • TCP sender events
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • TCP Flow Control
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Approaches towards congestion control
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • Refinement
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures
  • TCP Fairness
  • Why is TCP fair
  • Chapter 3 Summary
Page 18: Chapter 3: Transport Layer Our goals: r understand principles behind transport layer services: m multiplexing/demultiple xing m reliable data transfer.

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

Receiver ACK-only always send ACK for correctly-received

pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Go-Back-N

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery to

upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed pkt

advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1]

send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data connection-oriented flow controlled

point-to-point reliable in-order byte

steam pipelined send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next byte

expected from other side

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

longer than RTT too short premature

timeout too long slow reaction to

segment loss

Q how to estimate RTT SampleRTT measured time from

segment transmission until ACK receipt

SampleRTT will vary want estimated RTT ldquosmootherrdquo

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP reliable data transfer

TCP creates reliable data transfer service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks Retransmissions are triggered by

timeout events duplicate acks

TCP sender eventsdata rcvd from app Create segment with

seq seq is byte-stream

number of first data byte in segment

start timer if not already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment that

caused timeout restart timer

Ack rcvd If acknowledges

previously unacked segments update what is known to

be acked start timer if there are

outstanding segments

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment startsat lower end of gap

Fast Retransmit Time-out period often relatively long Detect lost segments via duplicate ACKs If sender receives 3 ACKs for the same data it supposes that

segment after ACKed data was lost

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

fast retransmita duplicate ACK for already ACKed segment

TCP Flow Control

receiver

speed-matching service matching the send rate to the receiving apprsquos drain rate

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow

TCP Connection Management

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much data

too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of

lost packet finite shared output link buffers

Host Ain original data

Host B

out

in original data plus retransmitted data

ldquocostsrdquo of congestion more work (retrans) for

given ldquogoodputrdquo unneeded retransmissions

link carries multiple copies of pkt

always (goodput)

ldquoperfectrdquo retransmission only when

loss

retransmission of delayed (not lost)

packet makes larger (than

perfect case) for same

in

out

=

in

out

gt

in

out

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit when packet dropped any

ldquoupstream transmission capacity used for that packet was wasted

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems

Two broad approaches towards congestion control

TCP Congestion Control

end-end control (no network assistance)

sender limits transmission LastByteSent-LastByteAcked

CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

TCP Slow Start

When connection begins increase rate exponentially until first loss event double CongWin every

RTT done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Refinement

After 3 dup ACKs CongWin is cut in half window then grows linearly (congestion avoidance)

But after timeout event CongWin instead set to 1 MSS window then grows exponentially (slow start) to a threshold then grows linearly (congestion

avoidance)

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

TCP Futures

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed needed

LRTT

MSS221

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughput increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Chapter 3 Summary principles behind transport

layer services multiplexing demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Chapter 3 Transport Layer
  • Transport services and protocols
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demux (cont)
  • Connection-oriented demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Principles of Reliable data transfer
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 has a fatal flaw
  • rdt21
  • rdt22 a NAK-free protocol
  • rdt30 channels with errors and loss
  • Pipelined protocols
  • Go-Back-N
  • Slide 19
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • Example RTT estimation
  • Slide 28
  • TCP reliable data transfer
  • TCP sender events
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • TCP Flow Control
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Approaches towards congestion control
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • Refinement
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures
  • TCP Fairness
  • Why is TCP fair
  • Chapter 3 Summary
Page 19: Chapter 3: Transport Layer Our goals: r understand principles behind transport layer services: m multiplexing/demultiple xing m reliable data transfer.

Receiver ACK-only always send ACK for correctly-received

pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Go-Back-N

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery to

upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed pkt

advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1]

send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data connection-oriented flow controlled

point-to-point reliable in-order byte

steam pipelined send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next byte

expected from other side

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

longer than RTT too short premature

timeout too long slow reaction to

segment loss

Q how to estimate RTT SampleRTT measured time from

segment transmission until ACK receipt

SampleRTT will vary want estimated RTT ldquosmootherrdquo

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP reliable data transfer

TCP creates reliable data transfer service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks Retransmissions are triggered by

timeout events duplicate acks

TCP sender eventsdata rcvd from app Create segment with

seq seq is byte-stream

number of first data byte in segment

start timer if not already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment that

caused timeout restart timer

Ack rcvd If acknowledges

previously unacked segments update what is known to

be acked start timer if there are

outstanding segments

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment startsat lower end of gap

Fast Retransmit Time-out period often relatively long Detect lost segments via duplicate ACKs If sender receives 3 ACKs for the same data it supposes that

segment after ACKed data was lost

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

fast retransmita duplicate ACK for already ACKed segment

TCP Flow Control

receiver

speed-matching service matching the send rate to the receiving apprsquos drain rate

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow

TCP Connection Management

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much data

too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of

lost packet finite shared output link buffers

Host Ain original data

Host B

out

in original data plus retransmitted data

ldquocostsrdquo of congestion more work (retrans) for

given ldquogoodputrdquo unneeded retransmissions

link carries multiple copies of pkt

always (goodput)

ldquoperfectrdquo retransmission only when

loss

retransmission of delayed (not lost)

packet makes larger (than

perfect case) for same

in

out

=

in

out

gt

in

out

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit when packet dropped any

ldquoupstream transmission capacity used for that packet was wasted

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems

Two broad approaches towards congestion control

TCP Congestion Control

end-end control (no network assistance)

sender limits transmission LastByteSent-LastByteAcked

CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

TCP Slow Start

When connection begins increase rate exponentially until first loss event double CongWin every

RTT done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Refinement

After 3 dup ACKs CongWin is cut in half window then grows linearly (congestion avoidance)

But after timeout event CongWin instead set to 1 MSS window then grows exponentially (slow start) to a threshold then grows linearly (congestion

avoidance)

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

TCP Futures

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed needed

LRTT

MSS221

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughput increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Chapter 3 Summary principles behind transport

layer services multiplexing demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Chapter 3 Transport Layer
  • Transport services and protocols
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demux (cont)
  • Connection-oriented demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Principles of Reliable data transfer
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 has a fatal flaw
  • rdt21
  • rdt22 a NAK-free protocol
  • rdt30 channels with errors and loss
  • Pipelined protocols
  • Go-Back-N
  • Slide 19
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • Example RTT estimation
  • Slide 28
  • TCP reliable data transfer
  • TCP sender events
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • TCP Flow Control
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Approaches towards congestion control
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • Refinement
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures
  • TCP Fairness
  • Why is TCP fair
  • Chapter 3 Summary
Page 20: Chapter 3: Transport Layer Our goals: r understand principles behind transport layer services: m multiplexing/demultiple xing m reliable data transfer.

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery to

upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed pkt

advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1]

send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data connection-oriented flow controlled

point-to-point reliable in-order byte

steam pipelined send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next byte

expected from other side

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

longer than RTT too short premature

timeout too long slow reaction to

segment loss

Q how to estimate RTT SampleRTT measured time from

segment transmission until ACK receipt

SampleRTT will vary want estimated RTT ldquosmootherrdquo

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP reliable data transfer

TCP creates reliable data transfer service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks Retransmissions are triggered by

timeout events duplicate acks

TCP sender eventsdata rcvd from app Create segment with

seq seq is byte-stream

number of first data byte in segment

start timer if not already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment that

caused timeout restart timer

Ack rcvd If acknowledges

previously unacked segments update what is known to

be acked start timer if there are

outstanding segments

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment startsat lower end of gap

Fast Retransmit Time-out period often relatively long Detect lost segments via duplicate ACKs If sender receives 3 ACKs for the same data it supposes that

segment after ACKed data was lost

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

fast retransmita duplicate ACK for already ACKed segment

TCP Flow Control

receiver

speed-matching service matching the send rate to the receiving apprsquos drain rate

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow

TCP Connection Management

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much data

too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of

lost packet finite shared output link buffers

Host Ain original data

Host B

out

in original data plus retransmitted data

ldquocostsrdquo of congestion more work (retrans) for

given ldquogoodputrdquo unneeded retransmissions

link carries multiple copies of pkt

always (goodput)

ldquoperfectrdquo retransmission only when

loss

retransmission of delayed (not lost)

packet makes larger (than

perfect case) for same

in

out

=

in

out

gt

in

out

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit when packet dropped any

ldquoupstream transmission capacity used for that packet was wasted

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems

Two broad approaches towards congestion control

TCP Congestion Control

end-end control (no network assistance)

sender limits transmission LastByteSent-LastByteAcked

CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

TCP Slow Start

When connection begins increase rate exponentially until first loss event double CongWin every

RTT done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Refinement

After 3 dup ACKs CongWin is cut in half window then grows linearly (congestion avoidance)

But after timeout event CongWin instead set to 1 MSS window then grows exponentially (slow start) to a threshold then grows linearly (congestion

avoidance)

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

TCP Futures

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed needed

LRTT

MSS221

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughput increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Chapter 3 Summary principles behind transport

layer services multiplexing demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Chapter 3 Transport Layer
  • Transport services and protocols
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demux (cont)
  • Connection-oriented demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Principles of Reliable data transfer
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 has a fatal flaw
  • rdt21
  • rdt22 a NAK-free protocol
  • rdt30 channels with errors and loss
  • Pipelined protocols
  • Go-Back-N
  • Slide 19
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • Example RTT estimation
  • Slide 28
  • TCP reliable data transfer
  • TCP sender events
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • TCP Flow Control
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Approaches towards congestion control
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • Refinement
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures
  • TCP Fairness
  • Why is TCP fair
  • Chapter 3 Summary
Page 21: Chapter 3: Transport Layer Our goals: r understand principles behind transport layer services: m multiplexing/demultiple xing m reliable data transfer.

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed pkt

advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1]

send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data connection-oriented flow controlled

point-to-point reliable in-order byte

steam pipelined send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next byte

expected from other side

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

longer than RTT too short premature

timeout too long slow reaction to

segment loss

Q how to estimate RTT SampleRTT measured time from

segment transmission until ACK receipt

SampleRTT will vary want estimated RTT ldquosmootherrdquo

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP reliable data transfer

TCP creates reliable data transfer service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks Retransmissions are triggered by

timeout events duplicate acks

TCP sender eventsdata rcvd from app Create segment with

seq seq is byte-stream

number of first data byte in segment

start timer if not already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment that

caused timeout restart timer

Ack rcvd If acknowledges

previously unacked segments update what is known to

be acked start timer if there are

outstanding segments

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment startsat lower end of gap

Fast Retransmit Time-out period often relatively long Detect lost segments via duplicate ACKs If sender receives 3 ACKs for the same data it supposes that

segment after ACKed data was lost

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

fast retransmita duplicate ACK for already ACKed segment

TCP Flow Control

receiver

speed-matching service matching the send rate to the receiving apprsquos drain rate

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow

TCP Connection Management

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much data

too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of

lost packet finite shared output link buffers

Host Ain original data

Host B

out

in original data plus retransmitted data

ldquocostsrdquo of congestion more work (retrans) for

given ldquogoodputrdquo unneeded retransmissions

link carries multiple copies of pkt

always (goodput)

ldquoperfectrdquo retransmission only when

loss

retransmission of delayed (not lost)

packet makes larger (than

perfect case) for same

in

out

=

in

out

gt

in

out

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit when packet dropped any

ldquoupstream transmission capacity used for that packet was wasted

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems

Two broad approaches towards congestion control

TCP Congestion Control

end-end control (no network assistance)

sender limits transmission LastByteSent-LastByteAcked

CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

TCP Slow Start

When connection begins increase rate exponentially until first loss event double CongWin every

RTT done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Refinement

After 3 dup ACKs CongWin is cut in half window then grows linearly (congestion avoidance)

But after timeout event CongWin instead set to 1 MSS window then grows exponentially (slow start) to a threshold then grows linearly (congestion

avoidance)

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

TCP Futures

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed needed

LRTT

MSS221

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughput increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Chapter 3 Summary principles behind transport

layer services multiplexing demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Chapter 3 Transport Layer
  • Transport services and protocols
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demux (cont)
  • Connection-oriented demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Principles of Reliable data transfer
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 has a fatal flaw
  • rdt21
  • rdt22 a NAK-free protocol
  • rdt30 channels with errors and loss
  • Pipelined protocols
  • Go-Back-N
  • Slide 19
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • Example RTT estimation
  • Slide 28
  • TCP reliable data transfer
  • TCP sender events
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • TCP Flow Control
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Approaches towards congestion control
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • Refinement
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures
  • TCP Fairness
  • Why is TCP fair
  • Chapter 3 Summary
Page 22: Chapter 3: Transport Layer Our goals: r understand principles behind transport layer services: m multiplexing/demultiple xing m reliable data transfer.

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed pkt

advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-1]

send ACK(n) out-of-order buffer in-order deliver (also deliver

buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data connection-oriented flow controlled

point-to-point reliable in-order byte

steam pipelined send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next byte

expected from other side

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

longer than RTT too short premature

timeout too long slow reaction to

segment loss

Q how to estimate RTT SampleRTT measured time from

segment transmission until ACK receipt

SampleRTT will vary want estimated RTT ldquosmootherrdquo

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP reliable data transfer

TCP creates reliable data transfer service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks Retransmissions are triggered by

timeout events duplicate acks

TCP sender eventsdata rcvd from app Create segment with

seq seq is byte-stream

number of first data byte in segment

start timer if not already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment that

caused timeout restart timer

Ack rcvd If acknowledges

previously unacked segments update what is known to

be acked start timer if there are

outstanding segments

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment startsat lower end of gap

Fast Retransmit Time-out period often relatively long Detect lost segments via duplicate ACKs If sender receives 3 ACKs for the same data it supposes that

segment after ACKed data was lost

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

fast retransmita duplicate ACK for already ACKed segment

TCP Flow Control

receiver

speed-matching service matching the send rate to the receiving apprsquos drain rate

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow

TCP Connection Management

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much data

too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of

lost packet finite shared output link buffers

Host Ain original data

Host B

out

in original data plus retransmitted data

ldquocostsrdquo of congestion more work (retrans) for

given ldquogoodputrdquo unneeded retransmissions

link carries multiple copies of pkt

always (goodput)

ldquoperfectrdquo retransmission only when

loss

retransmission of delayed (not lost)

packet makes larger (than

perfect case) for same

in

out

=

in

out

gt

in

out

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit when packet dropped any

ldquoupstream transmission capacity used for that packet was wasted

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems

Two broad approaches towards congestion control

TCP Congestion Control

end-end control (no network assistance)

sender limits transmission LastByteSent-LastByteAcked

CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

TCP Slow Start

When connection begins increase rate exponentially until first loss event double CongWin every

RTT done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Refinement

After 3 dup ACKs CongWin is cut in half window then grows linearly (congestion avoidance)

But after timeout event CongWin instead set to 1 MSS window then grows exponentially (slow start) to a threshold then grows linearly (congestion

avoidance)

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

TCP Futures

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed needed

LRTT

MSS221

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughput increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Chapter 3 Summary principles behind transport

layer services multiplexing demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Chapter 3 Transport Layer
  • Transport services and protocols
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demux (cont)
  • Connection-oriented demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Principles of Reliable data transfer
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 has a fatal flaw
  • rdt21
  • rdt22 a NAK-free protocol
  • rdt30 channels with errors and loss
  • Pipelined protocols
  • Go-Back-N
  • Slide 19
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • Example RTT estimation
  • Slide 28
  • TCP reliable data transfer
  • TCP sender events
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • TCP Flow Control
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Approaches towards congestion control
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • Refinement
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures
  • TCP Fairness
  • Why is TCP fair
  • Chapter 3 Summary
Page 23: Chapter 3: Transport Layer Our goals: r understand principles behind transport layer services: m multiplexing/demultiple xing m reliable data transfer.

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data connection-oriented flow controlled

point-to-point reliable in-order byte

steam pipelined send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next byte

expected from other side

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

longer than RTT too short premature

timeout too long slow reaction to

segment loss

Q how to estimate RTT SampleRTT measured time from

segment transmission until ACK receipt

SampleRTT will vary want estimated RTT ldquosmootherrdquo

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP reliable data transfer

TCP creates reliable data transfer service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks Retransmissions are triggered by

timeout events duplicate acks

TCP sender eventsdata rcvd from app Create segment with

seq seq is byte-stream

number of first data byte in segment

start timer if not already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment that

caused timeout restart timer

Ack rcvd If acknowledges

previously unacked segments update what is known to

be acked start timer if there are

outstanding segments

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment startsat lower end of gap

Fast Retransmit Time-out period often relatively long Detect lost segments via duplicate ACKs If sender receives 3 ACKs for the same data it supposes that

segment after ACKed data was lost

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

fast retransmita duplicate ACK for already ACKed segment

TCP Flow Control

receiver

speed-matching service matching the send rate to the receiving apprsquos drain rate

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow

TCP Connection Management

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much data

too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of

lost packet finite shared output link buffers

Host Ain original data

Host B

out

in original data plus retransmitted data

ldquocostsrdquo of congestion more work (retrans) for

given ldquogoodputrdquo unneeded retransmissions

link carries multiple copies of pkt

always (goodput)

ldquoperfectrdquo retransmission only when

loss

retransmission of delayed (not lost)

packet makes larger (than

perfect case) for same

in

out

=

in

out

gt

in

out

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit when packet dropped any

ldquoupstream transmission capacity used for that packet was wasted

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems

Two broad approaches towards congestion control

TCP Congestion Control

end-end control (no network assistance)

sender limits transmission LastByteSent-LastByteAcked

CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

TCP Slow Start

When connection begins increase rate exponentially until first loss event double CongWin every

RTT done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Refinement

After 3 dup ACKs CongWin is cut in half window then grows linearly (congestion avoidance)

But after timeout event CongWin instead set to 1 MSS window then grows exponentially (slow start) to a threshold then grows linearly (congestion

avoidance)

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

TCP Futures

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed needed

LRTT

MSS221

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughput increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Chapter 3 Summary principles behind transport

layer services multiplexing demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Chapter 3 Transport Layer
  • Transport services and protocols
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demux (cont)
  • Connection-oriented demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Principles of Reliable data transfer
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 has a fatal flaw
  • rdt21
  • rdt22 a NAK-free protocol
  • rdt30 channels with errors and loss
  • Pipelined protocols
  • Go-Back-N
  • Slide 19
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • Example RTT estimation
  • Slide 28
  • TCP reliable data transfer
  • TCP sender events
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • TCP Flow Control
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Approaches towards congestion control
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • Refinement
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures
  • TCP Fairness
  • Why is TCP fair
  • Chapter 3 Summary
Page 24: Chapter 3: Transport Layer Our goals: r understand principles behind transport layer services: m multiplexing/demultiple xing m reliable data transfer.

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next byte

expected from other side

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

longer than RTT too short premature

timeout too long slow reaction to

segment loss

Q how to estimate RTT SampleRTT measured time from

segment transmission until ACK receipt

SampleRTT will vary want estimated RTT ldquosmootherrdquo

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP reliable data transfer

TCP creates reliable data transfer service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks Retransmissions are triggered by

timeout events duplicate acks

TCP sender eventsdata rcvd from app Create segment with

seq seq is byte-stream

number of first data byte in segment

start timer if not already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment that

caused timeout restart timer

Ack rcvd If acknowledges

previously unacked segments update what is known to

be acked start timer if there are

outstanding segments

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment startsat lower end of gap

Fast Retransmit Time-out period often relatively long Detect lost segments via duplicate ACKs If sender receives 3 ACKs for the same data it supposes that

segment after ACKed data was lost

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

fast retransmita duplicate ACK for already ACKed segment

TCP Flow Control

receiver

speed-matching service matching the send rate to the receiving apprsquos drain rate

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow

TCP Connection Management

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much data

too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of

lost packet finite shared output link buffers

Host Ain original data

Host B

out

in original data plus retransmitted data

ldquocostsrdquo of congestion more work (retrans) for

given ldquogoodputrdquo unneeded retransmissions

link carries multiple copies of pkt

always (goodput)

ldquoperfectrdquo retransmission only when

loss

retransmission of delayed (not lost)

packet makes larger (than

perfect case) for same

in

out

=

in

out

gt

in

out

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit when packet dropped any

ldquoupstream transmission capacity used for that packet was wasted

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems

Two broad approaches towards congestion control

TCP Congestion Control

end-end control (no network assistance)

sender limits transmission LastByteSent-LastByteAcked

CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

TCP Slow Start

When connection begins increase rate exponentially until first loss event double CongWin every

RTT done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Refinement

After 3 dup ACKs CongWin is cut in half window then grows linearly (congestion avoidance)

But after timeout event CongWin instead set to 1 MSS window then grows exponentially (slow start) to a threshold then grows linearly (congestion

avoidance)

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

TCP Futures

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed needed

LRTT

MSS221

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughput increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Chapter 3 Summary principles behind transport

layer services multiplexing demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Chapter 3 Transport Layer
  • Transport services and protocols
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demux (cont)
  • Connection-oriented demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Principles of Reliable data transfer
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 has a fatal flaw
  • rdt21
  • rdt22 a NAK-free protocol
  • rdt30 channels with errors and loss
  • Pipelined protocols
  • Go-Back-N
  • Slide 19
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • Example RTT estimation
  • Slide 28
  • TCP reliable data transfer
  • TCP sender events
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • TCP Flow Control
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Approaches towards congestion control
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • Refinement
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures
  • TCP Fairness
  • Why is TCP fair
  • Chapter 3 Summary
Page 25: Chapter 3: Transport Layer Our goals: r understand principles behind transport layer services: m multiplexing/demultiple xing m reliable data transfer.

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next byte

expected from other side

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

longer than RTT too short premature

timeout too long slow reaction to

segment loss

Q how to estimate RTT SampleRTT measured time from

segment transmission until ACK receipt

SampleRTT will vary want estimated RTT ldquosmootherrdquo

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP reliable data transfer

TCP creates reliable data transfer service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks Retransmissions are triggered by

timeout events duplicate acks

TCP sender eventsdata rcvd from app Create segment with

seq seq is byte-stream

number of first data byte in segment

start timer if not already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment that

caused timeout restart timer

Ack rcvd If acknowledges

previously unacked segments update what is known to

be acked start timer if there are

outstanding segments

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment startsat lower end of gap

Fast Retransmit Time-out period often relatively long Detect lost segments via duplicate ACKs If sender receives 3 ACKs for the same data it supposes that

segment after ACKed data was lost

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

fast retransmita duplicate ACK for already ACKed segment

TCP Flow Control

receiver

speed-matching service matching the send rate to the receiving apprsquos drain rate

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow

TCP Connection Management

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much data

too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of

lost packet finite shared output link buffers

Host Ain original data

Host B

out

in original data plus retransmitted data

ldquocostsrdquo of congestion more work (retrans) for

given ldquogoodputrdquo unneeded retransmissions

link carries multiple copies of pkt

always (goodput)

ldquoperfectrdquo retransmission only when

loss

retransmission of delayed (not lost)

packet makes larger (than

perfect case) for same

in

out

=

in

out

gt

in

out

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit when packet dropped any

ldquoupstream transmission capacity used for that packet was wasted

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems

Two broad approaches towards congestion control

TCP Congestion Control

end-end control (no network assistance)

sender limits transmission LastByteSent-LastByteAcked

CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

TCP Slow Start

When connection begins increase rate exponentially until first loss event double CongWin every

RTT done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Refinement

After 3 dup ACKs CongWin is cut in half window then grows linearly (congestion avoidance)

But after timeout event CongWin instead set to 1 MSS window then grows exponentially (slow start) to a threshold then grows linearly (congestion

avoidance)

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

TCP Futures

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed needed

LRTT

MSS221

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughput increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Chapter 3 Summary principles behind transport

layer services multiplexing demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Chapter 3 Transport Layer
  • Transport services and protocols
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demux (cont)
  • Connection-oriented demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Principles of Reliable data transfer
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 has a fatal flaw
  • rdt21
  • rdt22 a NAK-free protocol
  • rdt30 channels with errors and loss
  • Pipelined protocols
  • Go-Back-N
  • Slide 19
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • Example RTT estimation
  • Slide 28
  • TCP reliable data transfer
  • TCP sender events
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • TCP Flow Control
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Approaches towards congestion control
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • Refinement
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures
  • TCP Fairness
  • Why is TCP fair
  • Chapter 3 Summary
Page 26: Chapter 3: Transport Layer Our goals: r understand principles behind transport layer services: m multiplexing/demultiple xing m reliable data transfer.

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

longer than RTT too short premature

timeout too long slow reaction to

segment loss

Q how to estimate RTT SampleRTT measured time from

segment transmission until ACK receipt

SampleRTT will vary want estimated RTT ldquosmootherrdquo

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP reliable data transfer

TCP creates reliable data transfer service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks Retransmissions are triggered by

timeout events duplicate acks

TCP sender eventsdata rcvd from app Create segment with

seq seq is byte-stream

number of first data byte in segment

start timer if not already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment that

caused timeout restart timer

Ack rcvd If acknowledges

previously unacked segments update what is known to

be acked start timer if there are

outstanding segments

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment startsat lower end of gap

Fast Retransmit Time-out period often relatively long Detect lost segments via duplicate ACKs If sender receives 3 ACKs for the same data it supposes that

segment after ACKed data was lost

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

fast retransmita duplicate ACK for already ACKed segment

TCP Flow Control

receiver

speed-matching service matching the send rate to the receiving apprsquos drain rate

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow

TCP Connection Management

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much data

too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of

lost packet finite shared output link buffers

Host Ain original data

Host B

out

in original data plus retransmitted data

ldquocostsrdquo of congestion more work (retrans) for

given ldquogoodputrdquo unneeded retransmissions

link carries multiple copies of pkt

always (goodput)

ldquoperfectrdquo retransmission only when

loss

retransmission of delayed (not lost)

packet makes larger (than

perfect case) for same

in

out

=

in

out

gt

in

out

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit when packet dropped any

ldquoupstream transmission capacity used for that packet was wasted

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems

Two broad approaches towards congestion control

TCP Congestion Control

end-end control (no network assistance)

sender limits transmission LastByteSent-LastByteAcked

CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

TCP Slow Start

When connection begins increase rate exponentially until first loss event double CongWin every

RTT done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Refinement

After 3 dup ACKs CongWin is cut in half window then grows linearly (congestion avoidance)

But after timeout event CongWin instead set to 1 MSS window then grows exponentially (slow start) to a threshold then grows linearly (congestion

avoidance)

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

TCP Futures

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed needed

LRTT

MSS221

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughput increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Chapter 3 Summary principles behind transport

layer services multiplexing demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Chapter 3 Transport Layer
  • Transport services and protocols
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demux (cont)
  • Connection-oriented demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Principles of Reliable data transfer
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 has a fatal flaw
  • rdt21
  • rdt22 a NAK-free protocol
  • rdt30 channels with errors and loss
  • Pipelined protocols
  • Go-Back-N
  • Slide 19
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • Example RTT estimation
  • Slide 28
  • TCP reliable data transfer
  • TCP sender events
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • TCP Flow Control
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Approaches towards congestion control
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • Refinement
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures
  • TCP Fairness
  • Why is TCP fair
  • Chapter 3 Summary
Page 27: Chapter 3: Transport Layer Our goals: r understand principles behind transport layer services: m multiplexing/demultiple xing m reliable data transfer.

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP reliable data transfer

TCP creates reliable data transfer service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks Retransmissions are triggered by

timeout events duplicate acks

TCP sender eventsdata rcvd from app Create segment with

seq seq is byte-stream

number of first data byte in segment

start timer if not already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment that

caused timeout restart timer

Ack rcvd If acknowledges

previously unacked segments update what is known to

be acked start timer if there are

outstanding segments

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment startsat lower end of gap

Fast Retransmit Time-out period often relatively long Detect lost segments via duplicate ACKs If sender receives 3 ACKs for the same data it supposes that

segment after ACKed data was lost

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

fast retransmita duplicate ACK for already ACKed segment

TCP Flow Control

receiver

speed-matching service matching the send rate to the receiving apprsquos drain rate

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow

TCP Connection Management

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much data

too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of

lost packet finite shared output link buffers

Host Ain original data

Host B

out

in original data plus retransmitted data

ldquocostsrdquo of congestion more work (retrans) for

given ldquogoodputrdquo unneeded retransmissions

link carries multiple copies of pkt

always (goodput)

ldquoperfectrdquo retransmission only when

loss

retransmission of delayed (not lost)

packet makes larger (than

perfect case) for same

in

out

=

in

out

gt

in

out

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit when packet dropped any

ldquoupstream transmission capacity used for that packet was wasted

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems

Two broad approaches towards congestion control

TCP Congestion Control

end-end control (no network assistance)

sender limits transmission LastByteSent-LastByteAcked

CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

TCP Slow Start

When connection begins increase rate exponentially until first loss event double CongWin every

RTT done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Refinement

After 3 dup ACKs CongWin is cut in half window then grows linearly (congestion avoidance)

But after timeout event CongWin instead set to 1 MSS window then grows exponentially (slow start) to a threshold then grows linearly (congestion

avoidance)

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

TCP Futures

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed needed

LRTT

MSS221

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughput increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Chapter 3 Summary principles behind transport

layer services multiplexing demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Chapter 3 Transport Layer
  • Transport services and protocols
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demux (cont)
  • Connection-oriented demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Principles of Reliable data transfer
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 has a fatal flaw
  • rdt21
  • rdt22 a NAK-free protocol
  • rdt30 channels with errors and loss
  • Pipelined protocols
  • Go-Back-N
  • Slide 19
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • Example RTT estimation
  • Slide 28
  • TCP reliable data transfer
  • TCP sender events
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • TCP Flow Control
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Approaches towards congestion control
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • Refinement
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures
  • TCP Fairness
  • Why is TCP fair
  • Chapter 3 Summary
Page 28: Chapter 3: Transport Layer Our goals: r understand principles behind transport layer services: m multiplexing/demultiple xing m reliable data transfer.

TCP Round Trip Time and Timeout

Setting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP reliable data transfer

TCP creates reliable data transfer service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks Retransmissions are triggered by

timeout events duplicate acks

TCP sender eventsdata rcvd from app Create segment with

seq seq is byte-stream

number of first data byte in segment

start timer if not already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment that

caused timeout restart timer

Ack rcvd If acknowledges

previously unacked segments update what is known to

be acked start timer if there are

outstanding segments

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment startsat lower end of gap

Fast Retransmit Time-out period often relatively long Detect lost segments via duplicate ACKs If sender receives 3 ACKs for the same data it supposes that

segment after ACKed data was lost

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

fast retransmita duplicate ACK for already ACKed segment

TCP Flow Control

receiver

speed-matching service matching the send rate to the receiving apprsquos drain rate

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow

TCP Connection Management

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much data

too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of

lost packet finite shared output link buffers

Host Ain original data

Host B

out

in original data plus retransmitted data

ldquocostsrdquo of congestion more work (retrans) for

given ldquogoodputrdquo unneeded retransmissions

link carries multiple copies of pkt

always (goodput)

ldquoperfectrdquo retransmission only when

loss

retransmission of delayed (not lost)

packet makes larger (than

perfect case) for same

in

out

=

in

out

gt

in

out

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit when packet dropped any

ldquoupstream transmission capacity used for that packet was wasted

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems

Two broad approaches towards congestion control

TCP Congestion Control

end-end control (no network assistance)

sender limits transmission LastByteSent-LastByteAcked

CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

TCP Slow Start

When connection begins increase rate exponentially until first loss event double CongWin every

RTT done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Refinement

After 3 dup ACKs CongWin is cut in half window then grows linearly (congestion avoidance)

But after timeout event CongWin instead set to 1 MSS window then grows exponentially (slow start) to a threshold then grows linearly (congestion

avoidance)

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

TCP Futures

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed needed

LRTT

MSS221

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughput increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Chapter 3 Summary principles behind transport

layer services multiplexing demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Chapter 3 Transport Layer
  • Transport services and protocols
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demux (cont)
  • Connection-oriented demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Principles of Reliable data transfer
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 has a fatal flaw
  • rdt21
  • rdt22 a NAK-free protocol
  • rdt30 channels with errors and loss
  • Pipelined protocols
  • Go-Back-N
  • Slide 19
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • Example RTT estimation
  • Slide 28
  • TCP reliable data transfer
  • TCP sender events
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • TCP Flow Control
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Approaches towards congestion control
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • Refinement
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures
  • TCP Fairness
  • Why is TCP fair
  • Chapter 3 Summary
Page 29: Chapter 3: Transport Layer Our goals: r understand principles behind transport layer services: m multiplexing/demultiple xing m reliable data transfer.

TCP reliable data transfer

TCP creates reliable data transfer service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks Retransmissions are triggered by

timeout events duplicate acks

TCP sender eventsdata rcvd from app Create segment with

seq seq is byte-stream

number of first data byte in segment

start timer if not already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment that

caused timeout restart timer

Ack rcvd If acknowledges

previously unacked segments update what is known to

be acked start timer if there are

outstanding segments

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment startsat lower end of gap

Fast Retransmit Time-out period often relatively long Detect lost segments via duplicate ACKs If sender receives 3 ACKs for the same data it supposes that

segment after ACKed data was lost

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

fast retransmita duplicate ACK for already ACKed segment

TCP Flow Control

receiver

speed-matching service matching the send rate to the receiving apprsquos drain rate

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow

TCP Connection Management

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much data

too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of

lost packet finite shared output link buffers

Host Ain original data

Host B

out

in original data plus retransmitted data

ldquocostsrdquo of congestion more work (retrans) for

given ldquogoodputrdquo unneeded retransmissions

link carries multiple copies of pkt

always (goodput)

ldquoperfectrdquo retransmission only when

loss

retransmission of delayed (not lost)

packet makes larger (than

perfect case) for same

in

out

=

in

out

gt

in

out

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit when packet dropped any

ldquoupstream transmission capacity used for that packet was wasted

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems

Two broad approaches towards congestion control

TCP Congestion Control

end-end control (no network assistance)

sender limits transmission LastByteSent-LastByteAcked

CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

TCP Slow Start

When connection begins increase rate exponentially until first loss event double CongWin every

RTT done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Refinement

After 3 dup ACKs CongWin is cut in half window then grows linearly (congestion avoidance)

But after timeout event CongWin instead set to 1 MSS window then grows exponentially (slow start) to a threshold then grows linearly (congestion

avoidance)

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

TCP Futures

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed needed

LRTT

MSS221

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughput increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Chapter 3 Summary principles behind transport

layer services multiplexing demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Chapter 3 Transport Layer
  • Transport services and protocols
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demux (cont)
  • Connection-oriented demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Principles of Reliable data transfer
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 has a fatal flaw
  • rdt21
  • rdt22 a NAK-free protocol
  • rdt30 channels with errors and loss
  • Pipelined protocols
  • Go-Back-N
  • Slide 19
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • Example RTT estimation
  • Slide 28
  • TCP reliable data transfer
  • TCP sender events
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • TCP Flow Control
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Approaches towards congestion control
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • Refinement
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures
  • TCP Fairness
  • Why is TCP fair
  • Chapter 3 Summary
Page 30: Chapter 3: Transport Layer Our goals: r understand principles behind transport layer services: m multiplexing/demultiple xing m reliable data transfer.

TCP sender eventsdata rcvd from app Create segment with

seq seq is byte-stream

number of first data byte in segment

start timer if not already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment that

caused timeout restart timer

Ack rcvd If acknowledges

previously unacked segments update what is known to

be acked start timer if there are

outstanding segments

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment startsat lower end of gap

Fast Retransmit Time-out period often relatively long Detect lost segments via duplicate ACKs If sender receives 3 ACKs for the same data it supposes that

segment after ACKed data was lost

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

fast retransmita duplicate ACK for already ACKed segment

TCP Flow Control

receiver

speed-matching service matching the send rate to the receiving apprsquos drain rate

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow

TCP Connection Management

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much data

too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of

lost packet finite shared output link buffers

Host Ain original data

Host B

out

in original data plus retransmitted data

ldquocostsrdquo of congestion more work (retrans) for

given ldquogoodputrdquo unneeded retransmissions

link carries multiple copies of pkt

always (goodput)

ldquoperfectrdquo retransmission only when

loss

retransmission of delayed (not lost)

packet makes larger (than

perfect case) for same

in

out

=

in

out

gt

in

out

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit when packet dropped any

ldquoupstream transmission capacity used for that packet was wasted

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems

Two broad approaches towards congestion control

TCP Congestion Control

end-end control (no network assistance)

sender limits transmission LastByteSent-LastByteAcked

CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

TCP Slow Start

When connection begins increase rate exponentially until first loss event double CongWin every

RTT done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Refinement

After 3 dup ACKs CongWin is cut in half window then grows linearly (congestion avoidance)

But after timeout event CongWin instead set to 1 MSS window then grows exponentially (slow start) to a threshold then grows linearly (congestion

avoidance)

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

TCP Futures

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed needed

LRTT

MSS221

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughput increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Chapter 3 Summary principles behind transport

layer services multiplexing demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Chapter 3 Transport Layer
  • Transport services and protocols
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demux (cont)
  • Connection-oriented demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Principles of Reliable data transfer
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 has a fatal flaw
  • rdt21
  • rdt22 a NAK-free protocol
  • rdt30 channels with errors and loss
  • Pipelined protocols
  • Go-Back-N
  • Slide 19
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • Example RTT estimation
  • Slide 28
  • TCP reliable data transfer
  • TCP sender events
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • TCP Flow Control
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Approaches towards congestion control
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • Refinement
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures
  • TCP Fairness
  • Why is TCP fair
  • Chapter 3 Summary
Page 31: Chapter 3: Transport Layer Our goals: r understand principles behind transport layer services: m multiplexing/demultiple xing m reliable data transfer.

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment startsat lower end of gap

Fast Retransmit Time-out period often relatively long Detect lost segments via duplicate ACKs If sender receives 3 ACKs for the same data it supposes that

segment after ACKed data was lost

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

fast retransmita duplicate ACK for already ACKed segment

TCP Flow Control

receiver

speed-matching service matching the send rate to the receiving apprsquos drain rate

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow

TCP Connection Management

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much data

too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of

lost packet finite shared output link buffers

Host Ain original data

Host B

out

in original data plus retransmitted data

ldquocostsrdquo of congestion more work (retrans) for

given ldquogoodputrdquo unneeded retransmissions

link carries multiple copies of pkt

always (goodput)

ldquoperfectrdquo retransmission only when

loss

retransmission of delayed (not lost)

packet makes larger (than

perfect case) for same

in

out

=

in

out

gt

in

out

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit when packet dropped any

ldquoupstream transmission capacity used for that packet was wasted

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems

Two broad approaches towards congestion control

TCP Congestion Control

end-end control (no network assistance)

sender limits transmission LastByteSent-LastByteAcked

CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

TCP Slow Start

When connection begins increase rate exponentially until first loss event double CongWin every

RTT done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Refinement

After 3 dup ACKs CongWin is cut in half window then grows linearly (congestion avoidance)

But after timeout event CongWin instead set to 1 MSS window then grows exponentially (slow start) to a threshold then grows linearly (congestion

avoidance)

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

TCP Futures

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed needed

LRTT

MSS221

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughput increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Chapter 3 Summary principles behind transport

layer services multiplexing demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Chapter 3 Transport Layer
  • Transport services and protocols
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demux (cont)
  • Connection-oriented demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Principles of Reliable data transfer
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 has a fatal flaw
  • rdt21
  • rdt22 a NAK-free protocol
  • rdt30 channels with errors and loss
  • Pipelined protocols
  • Go-Back-N
  • Slide 19
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • Example RTT estimation
  • Slide 28
  • TCP reliable data transfer
  • TCP sender events
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • TCP Flow Control
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Approaches towards congestion control
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • Refinement
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures
  • TCP Fairness
  • Why is TCP fair
  • Chapter 3 Summary
Page 32: Chapter 3: Transport Layer Our goals: r understand principles behind transport layer services: m multiplexing/demultiple xing m reliable data transfer.

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment startsat lower end of gap

Fast Retransmit Time-out period often relatively long Detect lost segments via duplicate ACKs If sender receives 3 ACKs for the same data it supposes that

segment after ACKed data was lost

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

fast retransmita duplicate ACK for already ACKed segment

TCP Flow Control

receiver

speed-matching service matching the send rate to the receiving apprsquos drain rate

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow

TCP Connection Management

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much data

too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of

lost packet finite shared output link buffers

Host Ain original data

Host B

out

in original data plus retransmitted data

ldquocostsrdquo of congestion more work (retrans) for

given ldquogoodputrdquo unneeded retransmissions

link carries multiple copies of pkt

always (goodput)

ldquoperfectrdquo retransmission only when

loss

retransmission of delayed (not lost)

packet makes larger (than

perfect case) for same

in

out

=

in

out

gt

in

out

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit when packet dropped any

ldquoupstream transmission capacity used for that packet was wasted

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems

Two broad approaches towards congestion control

TCP Congestion Control

end-end control (no network assistance)

sender limits transmission LastByteSent-LastByteAcked

CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

TCP Slow Start

When connection begins increase rate exponentially until first loss event double CongWin every

RTT done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Refinement

After 3 dup ACKs CongWin is cut in half window then grows linearly (congestion avoidance)

But after timeout event CongWin instead set to 1 MSS window then grows exponentially (slow start) to a threshold then grows linearly (congestion

avoidance)

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

TCP Futures

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed needed

LRTT

MSS221

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughput increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Chapter 3 Summary principles behind transport

layer services multiplexing demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Chapter 3 Transport Layer
  • Transport services and protocols
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demux (cont)
  • Connection-oriented demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Principles of Reliable data transfer
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 has a fatal flaw
  • rdt21
  • rdt22 a NAK-free protocol
  • rdt30 channels with errors and loss
  • Pipelined protocols
  • Go-Back-N
  • Slide 19
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • Example RTT estimation
  • Slide 28
  • TCP reliable data transfer
  • TCP sender events
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • TCP Flow Control
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Approaches towards congestion control
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • Refinement
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures
  • TCP Fairness
  • Why is TCP fair
  • Chapter 3 Summary
Page 33: Chapter 3: Transport Layer Our goals: r understand principles behind transport layer services: m multiplexing/demultiple xing m reliable data transfer.

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment startsat lower end of gap

Fast Retransmit Time-out period often relatively long Detect lost segments via duplicate ACKs If sender receives 3 ACKs for the same data it supposes that

segment after ACKed data was lost

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

fast retransmita duplicate ACK for already ACKed segment

TCP Flow Control

receiver

speed-matching service matching the send rate to the receiving apprsquos drain rate

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow

TCP Connection Management

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much data

too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of

lost packet finite shared output link buffers

Host Ain original data

Host B

out

in original data plus retransmitted data

ldquocostsrdquo of congestion more work (retrans) for

given ldquogoodputrdquo unneeded retransmissions

link carries multiple copies of pkt

always (goodput)

ldquoperfectrdquo retransmission only when

loss

retransmission of delayed (not lost)

packet makes larger (than

perfect case) for same

in

out

=

in

out

gt

in

out

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit when packet dropped any

ldquoupstream transmission capacity used for that packet was wasted

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems

Two broad approaches towards congestion control

TCP Congestion Control

end-end control (no network assistance)

sender limits transmission LastByteSent-LastByteAcked

CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

TCP Slow Start

When connection begins increase rate exponentially until first loss event double CongWin every

RTT done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Refinement

After 3 dup ACKs CongWin is cut in half window then grows linearly (congestion avoidance)

But after timeout event CongWin instead set to 1 MSS window then grows exponentially (slow start) to a threshold then grows linearly (congestion

avoidance)

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

TCP Futures

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed needed

LRTT

MSS221

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughput increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Chapter 3 Summary principles behind transport

layer services multiplexing demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Chapter 3 Transport Layer
  • Transport services and protocols
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demux (cont)
  • Connection-oriented demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Principles of Reliable data transfer
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 has a fatal flaw
  • rdt21
  • rdt22 a NAK-free protocol
  • rdt30 channels with errors and loss
  • Pipelined protocols
  • Go-Back-N
  • Slide 19
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • Example RTT estimation
  • Slide 28
  • TCP reliable data transfer
  • TCP sender events
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • TCP Flow Control
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Approaches towards congestion control
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • Refinement
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures
  • TCP Fairness
  • Why is TCP fair
  • Chapter 3 Summary
Page 34: Chapter 3: Transport Layer Our goals: r understand principles behind transport layer services: m multiplexing/demultiple xing m reliable data transfer.

Fast Retransmit Time-out period often relatively long Detect lost segments via duplicate ACKs If sender receives 3 ACKs for the same data it supposes that

segment after ACKed data was lost

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

fast retransmita duplicate ACK for already ACKed segment

TCP Flow Control

receiver

speed-matching service matching the send rate to the receiving apprsquos drain rate

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow

TCP Connection Management

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much data

too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of

lost packet finite shared output link buffers

Host Ain original data

Host B

out

in original data plus retransmitted data

ldquocostsrdquo of congestion more work (retrans) for

given ldquogoodputrdquo unneeded retransmissions

link carries multiple copies of pkt

always (goodput)

ldquoperfectrdquo retransmission only when

loss

retransmission of delayed (not lost)

packet makes larger (than

perfect case) for same

in

out

=

in

out

gt

in

out

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit when packet dropped any

ldquoupstream transmission capacity used for that packet was wasted

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems

Two broad approaches towards congestion control

TCP Congestion Control

end-end control (no network assistance)

sender limits transmission LastByteSent-LastByteAcked

CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

TCP Slow Start

When connection begins increase rate exponentially until first loss event double CongWin every

RTT done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Refinement

After 3 dup ACKs CongWin is cut in half window then grows linearly (congestion avoidance)

But after timeout event CongWin instead set to 1 MSS window then grows exponentially (slow start) to a threshold then grows linearly (congestion

avoidance)

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

TCP Futures

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed needed

LRTT

MSS221

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughput increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Chapter 3 Summary principles behind transport

layer services multiplexing demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Chapter 3 Transport Layer
  • Transport services and protocols
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demux (cont)
  • Connection-oriented demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Principles of Reliable data transfer
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 has a fatal flaw
  • rdt21
  • rdt22 a NAK-free protocol
  • rdt30 channels with errors and loss
  • Pipelined protocols
  • Go-Back-N
  • Slide 19
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • Example RTT estimation
  • Slide 28
  • TCP reliable data transfer
  • TCP sender events
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • TCP Flow Control
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Approaches towards congestion control
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • Refinement
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures
  • TCP Fairness
  • Why is TCP fair
  • Chapter 3 Summary
Page 35: Chapter 3: Transport Layer Our goals: r understand principles behind transport layer services: m multiplexing/demultiple xing m reliable data transfer.

TCP Flow Control

receiver

speed-matching service matching the send rate to the receiving apprsquos drain rate

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow

TCP Connection Management

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much data

too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of

lost packet finite shared output link buffers

Host Ain original data

Host B

out

in original data plus retransmitted data

ldquocostsrdquo of congestion more work (retrans) for

given ldquogoodputrdquo unneeded retransmissions

link carries multiple copies of pkt

always (goodput)

ldquoperfectrdquo retransmission only when

loss

retransmission of delayed (not lost)

packet makes larger (than

perfect case) for same

in

out

=

in

out

gt

in

out

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit when packet dropped any

ldquoupstream transmission capacity used for that packet was wasted

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems

Two broad approaches towards congestion control

TCP Congestion Control

end-end control (no network assistance)

sender limits transmission LastByteSent-LastByteAcked

CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

TCP Slow Start

When connection begins increase rate exponentially until first loss event double CongWin every

RTT done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Refinement

After 3 dup ACKs CongWin is cut in half window then grows linearly (congestion avoidance)

But after timeout event CongWin instead set to 1 MSS window then grows exponentially (slow start) to a threshold then grows linearly (congestion

avoidance)

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

TCP Futures

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed needed

LRTT

MSS221

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughput increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Chapter 3 Summary principles behind transport

layer services multiplexing demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Chapter 3 Transport Layer
  • Transport services and protocols
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demux (cont)
  • Connection-oriented demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Principles of Reliable data transfer
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 has a fatal flaw
  • rdt21
  • rdt22 a NAK-free protocol
  • rdt30 channels with errors and loss
  • Pipelined protocols
  • Go-Back-N
  • Slide 19
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • Example RTT estimation
  • Slide 28
  • TCP reliable data transfer
  • TCP sender events
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • TCP Flow Control
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Approaches towards congestion control
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • Refinement
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures
  • TCP Fairness
  • Why is TCP fair
  • Chapter 3 Summary
Page 36: Chapter 3: Transport Layer Our goals: r understand principles behind transport layer services: m multiplexing/demultiple xing m reliable data transfer.

TCP Connection Management

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial seq

Step 3 client receives SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much data

too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of

lost packet finite shared output link buffers

Host Ain original data

Host B

out

in original data plus retransmitted data

ldquocostsrdquo of congestion more work (retrans) for

given ldquogoodputrdquo unneeded retransmissions

link carries multiple copies of pkt

always (goodput)

ldquoperfectrdquo retransmission only when

loss

retransmission of delayed (not lost)

packet makes larger (than

perfect case) for same

in

out

=

in

out

gt

in

out

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit when packet dropped any

ldquoupstream transmission capacity used for that packet was wasted

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems

Two broad approaches towards congestion control

TCP Congestion Control

end-end control (no network assistance)

sender limits transmission LastByteSent-LastByteAcked

CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

TCP Slow Start

When connection begins increase rate exponentially until first loss event double CongWin every

RTT done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Refinement

After 3 dup ACKs CongWin is cut in half window then grows linearly (congestion avoidance)

But after timeout event CongWin instead set to 1 MSS window then grows exponentially (slow start) to a threshold then grows linearly (congestion

avoidance)

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

TCP Futures

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed needed

LRTT

MSS221

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughput increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Chapter 3 Summary principles behind transport

layer services multiplexing demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Chapter 3 Transport Layer
  • Transport services and protocols
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demux (cont)
  • Connection-oriented demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Principles of Reliable data transfer
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 has a fatal flaw
  • rdt21
  • rdt22 a NAK-free protocol
  • rdt30 channels with errors and loss
  • Pipelined protocols
  • Go-Back-N
  • Slide 19
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • Example RTT estimation
  • Slide 28
  • TCP reliable data transfer
  • TCP sender events
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • TCP Flow Control
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Approaches towards congestion control
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • Refinement
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures
  • TCP Fairness
  • Why is TCP fair
  • Chapter 3 Summary
Page 37: Chapter 3: Transport Layer Our goals: r understand principles behind transport layer services: m multiplexing/demultiple xing m reliable data transfer.

TCP Connection Management (cont)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much data

too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of

lost packet finite shared output link buffers

Host Ain original data

Host B

out

in original data plus retransmitted data

ldquocostsrdquo of congestion more work (retrans) for

given ldquogoodputrdquo unneeded retransmissions

link carries multiple copies of pkt

always (goodput)

ldquoperfectrdquo retransmission only when

loss

retransmission of delayed (not lost)

packet makes larger (than

perfect case) for same

in

out

=

in

out

gt

in

out

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit when packet dropped any

ldquoupstream transmission capacity used for that packet was wasted

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems

Two broad approaches towards congestion control

TCP Congestion Control

end-end control (no network assistance)

sender limits transmission LastByteSent-LastByteAcked

CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

TCP Slow Start

When connection begins increase rate exponentially until first loss event double CongWin every

RTT done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Refinement

After 3 dup ACKs CongWin is cut in half window then grows linearly (congestion avoidance)

But after timeout event CongWin instead set to 1 MSS window then grows exponentially (slow start) to a threshold then grows linearly (congestion

avoidance)

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

TCP Futures

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed needed

LRTT

MSS221

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughput increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Chapter 3 Summary principles behind transport

layer services multiplexing demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Chapter 3 Transport Layer
  • Transport services and protocols
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demux (cont)
  • Connection-oriented demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Principles of Reliable data transfer
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 has a fatal flaw
  • rdt21
  • rdt22 a NAK-free protocol
  • rdt30 channels with errors and loss
  • Pipelined protocols
  • Go-Back-N
  • Slide 19
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • Example RTT estimation
  • Slide 28
  • TCP reliable data transfer
  • TCP sender events
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • TCP Flow Control
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Approaches towards congestion control
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • Refinement
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures
  • TCP Fairness
  • Why is TCP fair
  • Chapter 3 Summary
Page 38: Chapter 3: Transport Layer Our goals: r understand principles behind transport layer services: m multiplexing/demultiple xing m reliable data transfer.

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much data

too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of

lost packet finite shared output link buffers

Host Ain original data

Host B

out

in original data plus retransmitted data

ldquocostsrdquo of congestion more work (retrans) for

given ldquogoodputrdquo unneeded retransmissions

link carries multiple copies of pkt

always (goodput)

ldquoperfectrdquo retransmission only when

loss

retransmission of delayed (not lost)

packet makes larger (than

perfect case) for same

in

out

=

in

out

gt

in

out

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit when packet dropped any

ldquoupstream transmission capacity used for that packet was wasted

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems

Two broad approaches towards congestion control

TCP Congestion Control

end-end control (no network assistance)

sender limits transmission LastByteSent-LastByteAcked

CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

TCP Slow Start

When connection begins increase rate exponentially until first loss event double CongWin every

RTT done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Refinement

After 3 dup ACKs CongWin is cut in half window then grows linearly (congestion avoidance)

But after timeout event CongWin instead set to 1 MSS window then grows exponentially (slow start) to a threshold then grows linearly (congestion

avoidance)

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

TCP Futures

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed needed

LRTT

MSS221

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughput increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Chapter 3 Summary principles behind transport

layer services multiplexing demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Chapter 3 Transport Layer
  • Transport services and protocols
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demux (cont)
  • Connection-oriented demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Principles of Reliable data transfer
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 has a fatal flaw
  • rdt21
  • rdt22 a NAK-free protocol
  • rdt30 channels with errors and loss
  • Pipelined protocols
  • Go-Back-N
  • Slide 19
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • Example RTT estimation
  • Slide 28
  • TCP reliable data transfer
  • TCP sender events
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • TCP Flow Control
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Approaches towards congestion control
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • Refinement
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures
  • TCP Fairness
  • Why is TCP fair
  • Chapter 3 Summary
Page 39: Chapter 3: Transport Layer Our goals: r understand principles behind transport layer services: m multiplexing/demultiple xing m reliable data transfer.

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too much data

too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of

lost packet finite shared output link buffers

Host Ain original data

Host B

out

in original data plus retransmitted data

ldquocostsrdquo of congestion more work (retrans) for

given ldquogoodputrdquo unneeded retransmissions

link carries multiple copies of pkt

always (goodput)

ldquoperfectrdquo retransmission only when

loss

retransmission of delayed (not lost)

packet makes larger (than

perfect case) for same

in

out

=

in

out

gt

in

out

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit when packet dropped any

ldquoupstream transmission capacity used for that packet was wasted

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems

Two broad approaches towards congestion control

TCP Congestion Control

end-end control (no network assistance)

sender limits transmission LastByteSent-LastByteAcked

CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

TCP Slow Start

When connection begins increase rate exponentially until first loss event double CongWin every

RTT done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Refinement

After 3 dup ACKs CongWin is cut in half window then grows linearly (congestion avoidance)

But after timeout event CongWin instead set to 1 MSS window then grows exponentially (slow start) to a threshold then grows linearly (congestion

avoidance)

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

TCP Futures

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed needed

LRTT

MSS221

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughput increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Chapter 3 Summary principles behind transport

layer services multiplexing demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Chapter 3 Transport Layer
  • Transport services and protocols
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demux (cont)
  • Connection-oriented demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Principles of Reliable data transfer
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 has a fatal flaw
  • rdt21
  • rdt22 a NAK-free protocol
  • rdt30 channels with errors and loss
  • Pipelined protocols
  • Go-Back-N
  • Slide 19
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • Example RTT estimation
  • Slide 28
  • TCP reliable data transfer
  • TCP sender events
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • TCP Flow Control
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Approaches towards congestion control
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • Refinement
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures
  • TCP Fairness
  • Why is TCP fair
  • Chapter 3 Summary
Page 40: Chapter 3: Transport Layer Our goals: r understand principles behind transport layer services: m multiplexing/demultiple xing m reliable data transfer.

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of

lost packet finite shared output link buffers

Host Ain original data

Host B

out

in original data plus retransmitted data

ldquocostsrdquo of congestion more work (retrans) for

given ldquogoodputrdquo unneeded retransmissions

link carries multiple copies of pkt

always (goodput)

ldquoperfectrdquo retransmission only when

loss

retransmission of delayed (not lost)

packet makes larger (than

perfect case) for same

in

out

=

in

out

gt

in

out

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit when packet dropped any

ldquoupstream transmission capacity used for that packet was wasted

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems

Two broad approaches towards congestion control

TCP Congestion Control

end-end control (no network assistance)

sender limits transmission LastByteSent-LastByteAcked

CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

TCP Slow Start

When connection begins increase rate exponentially until first loss event double CongWin every

RTT done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Refinement

After 3 dup ACKs CongWin is cut in half window then grows linearly (congestion avoidance)

But after timeout event CongWin instead set to 1 MSS window then grows exponentially (slow start) to a threshold then grows linearly (congestion

avoidance)

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

TCP Futures

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed needed

LRTT

MSS221

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughput increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Chapter 3 Summary principles behind transport

layer services multiplexing demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Chapter 3 Transport Layer
  • Transport services and protocols
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demux (cont)
  • Connection-oriented demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Principles of Reliable data transfer
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 has a fatal flaw
  • rdt21
  • rdt22 a NAK-free protocol
  • rdt30 channels with errors and loss
  • Pipelined protocols
  • Go-Back-N
  • Slide 19
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • Example RTT estimation
  • Slide 28
  • TCP reliable data transfer
  • TCP sender events
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • TCP Flow Control
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Approaches towards congestion control
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • Refinement
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures
  • TCP Fairness
  • Why is TCP fair
  • Chapter 3 Summary
Page 41: Chapter 3: Transport Layer Our goals: r understand principles behind transport layer services: m multiplexing/demultiple xing m reliable data transfer.

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of

lost packet finite shared output link buffers

Host Ain original data

Host B

out

in original data plus retransmitted data

ldquocostsrdquo of congestion more work (retrans) for

given ldquogoodputrdquo unneeded retransmissions

link carries multiple copies of pkt

always (goodput)

ldquoperfectrdquo retransmission only when

loss

retransmission of delayed (not lost)

packet makes larger (than

perfect case) for same

in

out

=

in

out

gt

in

out

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit when packet dropped any

ldquoupstream transmission capacity used for that packet was wasted

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems

Two broad approaches towards congestion control

TCP Congestion Control

end-end control (no network assistance)

sender limits transmission LastByteSent-LastByteAcked

CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

TCP Slow Start

When connection begins increase rate exponentially until first loss event double CongWin every

RTT done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Refinement

After 3 dup ACKs CongWin is cut in half window then grows linearly (congestion avoidance)

But after timeout event CongWin instead set to 1 MSS window then grows exponentially (slow start) to a threshold then grows linearly (congestion

avoidance)

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

TCP Futures

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed needed

LRTT

MSS221

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughput increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Chapter 3 Summary principles behind transport

layer services multiplexing demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Chapter 3 Transport Layer
  • Transport services and protocols
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demux (cont)
  • Connection-oriented demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Principles of Reliable data transfer
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 has a fatal flaw
  • rdt21
  • rdt22 a NAK-free protocol
  • rdt30 channels with errors and loss
  • Pipelined protocols
  • Go-Back-N
  • Slide 19
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • Example RTT estimation
  • Slide 28
  • TCP reliable data transfer
  • TCP sender events
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • TCP Flow Control
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Approaches towards congestion control
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • Refinement
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures
  • TCP Fairness
  • Why is TCP fair
  • Chapter 3 Summary
Page 42: Chapter 3: Transport Layer Our goals: r understand principles behind transport layer services: m multiplexing/demultiple xing m reliable data transfer.

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit when packet dropped any

ldquoupstream transmission capacity used for that packet was wasted

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems

Two broad approaches towards congestion control

TCP Congestion Control

end-end control (no network assistance)

sender limits transmission LastByteSent-LastByteAcked

CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

TCP Slow Start

When connection begins increase rate exponentially until first loss event double CongWin every

RTT done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Refinement

After 3 dup ACKs CongWin is cut in half window then grows linearly (congestion avoidance)

But after timeout event CongWin instead set to 1 MSS window then grows exponentially (slow start) to a threshold then grows linearly (congestion

avoidance)

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

TCP Futures

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed needed

LRTT

MSS221

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughput increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Chapter 3 Summary principles behind transport

layer services multiplexing demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Chapter 3 Transport Layer
  • Transport services and protocols
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demux (cont)
  • Connection-oriented demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Principles of Reliable data transfer
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 has a fatal flaw
  • rdt21
  • rdt22 a NAK-free protocol
  • rdt30 channels with errors and loss
  • Pipelined protocols
  • Go-Back-N
  • Slide 19
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • Example RTT estimation
  • Slide 28
  • TCP reliable data transfer
  • TCP sender events
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • TCP Flow Control
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Approaches towards congestion control
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • Refinement
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures
  • TCP Fairness
  • Why is TCP fair
  • Chapter 3 Summary
Page 43: Chapter 3: Transport Layer Our goals: r understand principles behind transport layer services: m multiplexing/demultiple xing m reliable data transfer.

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems

Two broad approaches towards congestion control

TCP Congestion Control

end-end control (no network assistance)

sender limits transmission LastByteSent-LastByteAcked

CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

TCP Slow Start

When connection begins increase rate exponentially until first loss event double CongWin every

RTT done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Refinement

After 3 dup ACKs CongWin is cut in half window then grows linearly (congestion avoidance)

But after timeout event CongWin instead set to 1 MSS window then grows exponentially (slow start) to a threshold then grows linearly (congestion

avoidance)

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

TCP Futures

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed needed

LRTT

MSS221

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughput increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Chapter 3 Summary principles behind transport

layer services multiplexing demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Chapter 3 Transport Layer
  • Transport services and protocols
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demux (cont)
  • Connection-oriented demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Principles of Reliable data transfer
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 has a fatal flaw
  • rdt21
  • rdt22 a NAK-free protocol
  • rdt30 channels with errors and loss
  • Pipelined protocols
  • Go-Back-N
  • Slide 19
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • Example RTT estimation
  • Slide 28
  • TCP reliable data transfer
  • TCP sender events
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • TCP Flow Control
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Approaches towards congestion control
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • Refinement
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures
  • TCP Fairness
  • Why is TCP fair
  • Chapter 3 Summary
Page 44: Chapter 3: Transport Layer Our goals: r understand principles behind transport layer services: m multiplexing/demultiple xing m reliable data transfer.

TCP Congestion Control

end-end control (no network assistance)

sender limits transmission LastByteSent-LastByteAcked

CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

TCP Slow Start

When connection begins increase rate exponentially until first loss event double CongWin every

RTT done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Refinement

After 3 dup ACKs CongWin is cut in half window then grows linearly (congestion avoidance)

But after timeout event CongWin instead set to 1 MSS window then grows exponentially (slow start) to a threshold then grows linearly (congestion

avoidance)

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

TCP Futures

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed needed

LRTT

MSS221

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughput increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Chapter 3 Summary principles behind transport

layer services multiplexing demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Chapter 3 Transport Layer
  • Transport services and protocols
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demux (cont)
  • Connection-oriented demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Principles of Reliable data transfer
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 has a fatal flaw
  • rdt21
  • rdt22 a NAK-free protocol
  • rdt30 channels with errors and loss
  • Pipelined protocols
  • Go-Back-N
  • Slide 19
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • Example RTT estimation
  • Slide 28
  • TCP reliable data transfer
  • TCP sender events
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • TCP Flow Control
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Approaches towards congestion control
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • Refinement
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures
  • TCP Fairness
  • Why is TCP fair
  • Chapter 3 Summary
Page 45: Chapter 3: Transport Layer Our goals: r understand principles behind transport layer services: m multiplexing/demultiple xing m reliable data transfer.

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease cut CongWin in half after loss event

additive increase increase CongWin by 1 MSS every RTT in the absence of loss events probing

Long-lived TCP connection

TCP Slow Start

When connection begins increase rate exponentially until first loss event double CongWin every

RTT done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Refinement

After 3 dup ACKs CongWin is cut in half window then grows linearly (congestion avoidance)

But after timeout event CongWin instead set to 1 MSS window then grows exponentially (slow start) to a threshold then grows linearly (congestion

avoidance)

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

TCP Futures

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed needed

LRTT

MSS221

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughput increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Chapter 3 Summary principles behind transport

layer services multiplexing demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Chapter 3 Transport Layer
  • Transport services and protocols
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demux (cont)
  • Connection-oriented demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Principles of Reliable data transfer
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 has a fatal flaw
  • rdt21
  • rdt22 a NAK-free protocol
  • rdt30 channels with errors and loss
  • Pipelined protocols
  • Go-Back-N
  • Slide 19
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • Example RTT estimation
  • Slide 28
  • TCP reliable data transfer
  • TCP sender events
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • TCP Flow Control
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Approaches towards congestion control
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • Refinement
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures
  • TCP Fairness
  • Why is TCP fair
  • Chapter 3 Summary
Page 46: Chapter 3: Transport Layer Our goals: r understand principles behind transport layer services: m multiplexing/demultiple xing m reliable data transfer.

TCP Slow Start

When connection begins increase rate exponentially until first loss event double CongWin every

RTT done by incrementing CongWin for every ACK received

Summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Refinement

After 3 dup ACKs CongWin is cut in half window then grows linearly (congestion avoidance)

But after timeout event CongWin instead set to 1 MSS window then grows exponentially (slow start) to a threshold then grows linearly (congestion

avoidance)

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

TCP Futures

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed needed

LRTT

MSS221

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughput increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Chapter 3 Summary principles behind transport

layer services multiplexing demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Chapter 3 Transport Layer
  • Transport services and protocols
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demux (cont)
  • Connection-oriented demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Principles of Reliable data transfer
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 has a fatal flaw
  • rdt21
  • rdt22 a NAK-free protocol
  • rdt30 channels with errors and loss
  • Pipelined protocols
  • Go-Back-N
  • Slide 19
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • Example RTT estimation
  • Slide 28
  • TCP reliable data transfer
  • TCP sender events
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • TCP Flow Control
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Approaches towards congestion control
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • Refinement
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures
  • TCP Fairness
  • Why is TCP fair
  • Chapter 3 Summary
Page 47: Chapter 3: Transport Layer Our goals: r understand principles behind transport layer services: m multiplexing/demultiple xing m reliable data transfer.

Refinement

After 3 dup ACKs CongWin is cut in half window then grows linearly (congestion avoidance)

But after timeout event CongWin instead set to 1 MSS window then grows exponentially (slow start) to a threshold then grows linearly (congestion

avoidance)

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

TCP Futures

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed needed

LRTT

MSS221

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughput increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Chapter 3 Summary principles behind transport

layer services multiplexing demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Chapter 3 Transport Layer
  • Transport services and protocols
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demux (cont)
  • Connection-oriented demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Principles of Reliable data transfer
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 has a fatal flaw
  • rdt21
  • rdt22 a NAK-free protocol
  • rdt30 channels with errors and loss
  • Pipelined protocols
  • Go-Back-N
  • Slide 19
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • Example RTT estimation
  • Slide 28
  • TCP reliable data transfer
  • TCP sender events
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • TCP Flow Control
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Approaches towards congestion control
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • Refinement
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures
  • TCP Fairness
  • Why is TCP fair
  • Chapter 3 Summary
Page 48: Chapter 3: Transport Layer Our goals: r understand principles behind transport layer services: m multiplexing/demultiple xing m reliable data transfer.

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

TCP Futures

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed needed

LRTT

MSS221

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughput increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Chapter 3 Summary principles behind transport

layer services multiplexing demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Chapter 3 Transport Layer
  • Transport services and protocols
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demux (cont)
  • Connection-oriented demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Principles of Reliable data transfer
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 has a fatal flaw
  • rdt21
  • rdt22 a NAK-free protocol
  • rdt30 channels with errors and loss
  • Pipelined protocols
  • Go-Back-N
  • Slide 19
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • Example RTT estimation
  • Slide 28
  • TCP reliable data transfer
  • TCP sender events
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • TCP Flow Control
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Approaches towards congestion control
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • Refinement
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures
  • TCP Fairness
  • Why is TCP fair
  • Chapter 3 Summary
Page 49: Chapter 3: Transport Layer Our goals: r understand principles behind transport layer services: m multiplexing/demultiple xing m reliable data transfer.

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

TCP Futures

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed needed

LRTT

MSS221

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughput increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Chapter 3 Summary principles behind transport

layer services multiplexing demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Chapter 3 Transport Layer
  • Transport services and protocols
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demux (cont)
  • Connection-oriented demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Principles of Reliable data transfer
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 has a fatal flaw
  • rdt21
  • rdt22 a NAK-free protocol
  • rdt30 channels with errors and loss
  • Pipelined protocols
  • Go-Back-N
  • Slide 19
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • Example RTT estimation
  • Slide 28
  • TCP reliable data transfer
  • TCP sender events
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • TCP Flow Control
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Approaches towards congestion control
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • Refinement
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures
  • TCP Fairness
  • Why is TCP fair
  • Chapter 3 Summary
Page 50: Chapter 3: Transport Layer Our goals: r understand principles behind transport layer services: m multiplexing/demultiple xing m reliable data transfer.

TCP Futures

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed needed

LRTT

MSS221

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughput increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Chapter 3 Summary principles behind transport

layer services multiplexing demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Chapter 3 Transport Layer
  • Transport services and protocols
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demux (cont)
  • Connection-oriented demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Principles of Reliable data transfer
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 has a fatal flaw
  • rdt21
  • rdt22 a NAK-free protocol
  • rdt30 channels with errors and loss
  • Pipelined protocols
  • Go-Back-N
  • Slide 19
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • Example RTT estimation
  • Slide 28
  • TCP reliable data transfer
  • TCP sender events
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • TCP Flow Control
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Approaches towards congestion control
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • Refinement
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures
  • TCP Fairness
  • Why is TCP fair
  • Chapter 3 Summary
Page 51: Chapter 3: Transport Layer Our goals: r understand principles behind transport layer services: m multiplexing/demultiple xing m reliable data transfer.

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughput increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Chapter 3 Summary principles behind transport

layer services multiplexing demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Chapter 3 Transport Layer
  • Transport services and protocols
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demux (cont)
  • Connection-oriented demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Principles of Reliable data transfer
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 has a fatal flaw
  • rdt21
  • rdt22 a NAK-free protocol
  • rdt30 channels with errors and loss
  • Pipelined protocols
  • Go-Back-N
  • Slide 19
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • Example RTT estimation
  • Slide 28
  • TCP reliable data transfer
  • TCP sender events
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • TCP Flow Control
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Approaches towards congestion control
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • Refinement
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures
  • TCP Fairness
  • Why is TCP fair
  • Chapter 3 Summary
Page 52: Chapter 3: Transport Layer Our goals: r understand principles behind transport layer services: m multiplexing/demultiple xing m reliable data transfer.

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughput increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Chapter 3 Summary principles behind transport

layer services multiplexing demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Chapter 3 Transport Layer
  • Transport services and protocols
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demux (cont)
  • Connection-oriented demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Principles of Reliable data transfer
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 has a fatal flaw
  • rdt21
  • rdt22 a NAK-free protocol
  • rdt30 channels with errors and loss
  • Pipelined protocols
  • Go-Back-N
  • Slide 19
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • Example RTT estimation
  • Slide 28
  • TCP reliable data transfer
  • TCP sender events
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • TCP Flow Control
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Approaches towards congestion control
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • Refinement
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures
  • TCP Fairness
  • Why is TCP fair
  • Chapter 3 Summary
Page 53: Chapter 3: Transport Layer Our goals: r understand principles behind transport layer services: m multiplexing/demultiple xing m reliable data transfer.

Chapter 3 Summary principles behind transport

layer services multiplexing demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Chapter 3 Transport Layer
  • Transport services and protocols
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demux (cont)
  • Connection-oriented demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Principles of Reliable data transfer
  • Rdt10 reliable transfer over a reliable channel
  • Rdt20 channel with bit errors
  • rdt20 has a fatal flaw
  • rdt21
  • rdt22 a NAK-free protocol
  • rdt30 channels with errors and loss
  • Pipelined protocols
  • Go-Back-N
  • Slide 19
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • Example RTT estimation
  • Slide 28
  • TCP reliable data transfer
  • TCP sender events
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • TCP Flow Control
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Approaches towards congestion control
  • TCP Congestion Control
  • TCP AIMD
  • TCP Slow Start
  • Refinement
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures
  • TCP Fairness
  • Why is TCP fair
  • Chapter 3 Summary