Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides...

80
Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross

Transcript of Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides...

Page 1: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer UDP and TCP

CS491G Computer Networking Lab

V Arun

Transport Layer 3-1Slides adapted from Kurose and Ross

Transport Layer 3-2

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-3

Transport services and protocols

provide logical communication between app processes running on different hosts

transport protocols run in end systems send side breaks app

messages into segments passes to network layer

recv side reassembles segments into messages passes to app layer

more than one transport protocol available to apps Internet TCP and UDP

application

transportnetworkdata linkphysical

logical end-end transport

application

transportnetworkdata linkphysical

Transport Layer 3-4

Transport vs network layer network layer

logical communication between hosts

transport layer logical communication between processes relies on and

enhances network layer services

12 kids in Annrsquos house sending letters to 12 kids in Billrsquos house

hosts = houses processes = kids app messages =

letters in envelopes transport protocol =

Ann and Bill who demux to in-house siblings

network-layer protocol = postal service

household analogy

Transport Layer 3-5

Internet transport-layer protocols

reliable in-order delivery (TCP) congestion control flow control connection setup

unreliable unordered delivery UDP no-frills extension of

ldquobest-effortrdquo IP services not

available delay guarantees bandwidth

guarantees

application

transportnetworkdata linkphysical

application

transportnetworkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

logical end-end transport

Transport Layer 3-6

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-7

Multiplexingdemultiplexing

process

socket

use header info to deliverreceived segments to correct socket

demultiplexing at receiverhandle data from multiplesockets add transport header (later used for demultiplexing)

multiplexing at sender

transport

application

physical

link

network

P2P1

transport

application

physical

link

network

P4

transport

application

physical

link

network

P3

Transport Layer 3-8

How demultiplexing works

host receives IP datagrams each datagram has source and

destination IP address each datagram carries one

transport-layer segment each segment has source and

destination port number host uses IP addresses amp port

numbers to direct segment to right socket

source port dest port

32 bits

applicationdata

(payload)

other header fields

TCPUDP segment format

Transport Layer 3-9

Connectionless demultiplexing

recall created socket has host-local port

DatagramSocket mySocket1 = new DatagramSocket(12534)

when host receives UDP segment checks destination IP

and port in segment directs UDP segment

to socket bound to that (IPport)

recall when creating datagram to send into UDP socket must specify

destination IP address destination port IP datagrams with same dest (IP port) but different source IP addresses andor source port numbers will be directed to same socket

Transport Layer 3-10

Connectionless demux example

DatagramSocket serverSocket = new DatagramSocket

(6428)

transport

application

physical

link

network

P3transport

application

physical

link

network

P1

transport

application

physical

link

network

P4

DatagramSocket mySocket1 = new DatagramSocket (5775)

DatagramSocket mySocket2 = new DatagramSocket (9157)

source port 9157dest port 6428

source port 6428dest port 9157

source port dest port

source port dest port

Transport Layer 3-11

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

demux receiver uses all four values to direct segment to right socket

server host has many simultaneous TCP sockets each socket identified

by its own 4-tuple web servers have

different socket each client non-persistent HTTP

will have different socket for each request

Transport Layer 3-12

Connection-oriented demux example

transport

application

physical

link

network

P3transport

app

physical

link

P4

transport

application

physical

link

network

P2

source IPport A9157dest IP port B80

source IPport B80dest IPport A9157

host IP address

A

host IP address

C

network

P6P5P3

source IPport C5775dest IPport B80

source IPport C9157dest IPport B80

three segments all destined to IP address B dest port 80 are demultiplexed to different sockets

server IP

address B

server socket also port 80

Transport Layer 3-13

Connection-oriented demux example

transport

application

physical

link

network

P3transport

app

physical

link

transport

application

physical

link

network

P2

source IPport A9157dest IP port B80

source IPport B80dest IPport A9157

host IP address

A

host IP address

C

server IP

address B

network

P3

source IPport C5775dest IPport B80

source IPport C9157dest IPport B80

P4

server socket also port 80threaded server

Transport Layer 3-14

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-15

UDP User Datagram Protocol [RFC 768] no frills bare bones

transport protocol for ldquobest effortrdquo service UDP segments may be lost delivered out-of-

order connectionless

no sender-receiver handshaking

each UDP segment handled independently

UDP uses streaming

multimedia apps (loss tolerant rate sensitive)

DNS SNMP

reliable transfer over UDP add reliability at

application layer application-specific

error recovery

Transport Layer 3-16

UDP segment header

source port dest port

32 bits

applicationdata

(payload)

UDP segment format

length checksum

length in bytes of UDP segment

including header

no connection establishment (which can add delay)

simple no connection state at sender receiver

small header size no congestion control

UDP can blast away as fast as desired

why is there a UDP

Transport Layer 3-17

UDP checksum

sender treat segment contents

including header fields as sequence of 16-bit integers

checksum addition (onersquos complement sum) of segment contents

sender puts checksum value into UDP checksum field

receiver compute checksum of

received segment check if computed

checksum equals checksum field value NO - error detected YES - no error detected

But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (flipped bits) in segments

Transport Layer 3-18

Internet checksum exampleexample add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

Note when adding numbers a carryout from the most significant bit needs to be added to the

result

Q1 Sockets and multiplexing TCP uses more information in packet

headers in order to demultiplex packets compared to UDPA TrueB False

Transport Layer 3-19

Q2 Sockets UDP Suppose we use UDP instead of TCP under

HTTP for designing a web server where all requests and responses fit in a single packet Suppose a 100 clients are simultaneously communicating with this web server How many sockets are respectively at the server and at each clientA 11B 21C 2002D 1001E 101 1

Transport Layer 3-20

Q3 Sockets TCP

Suppose a 100 clients are simultaneously communicating with (a traditional HTTPTCP) web server How many sockets are respectively at the server and at each clientA 11B 21C 2002D 1001E 101 1

Transport Layer 3-21

Q4 Sockets TCP

Suppose a 100 clients are simultaneously communicating with (a traditional HTTPTCP) web server Do all of the sockets at the server have the same server-side port numberA YesB No

Transport Layer 3-22

Q5 UDP checksums Letrsquos denote a UDP packet as (checksum

data) ignoring other fields for this question Suppose a sender sends (0010 1110) and the receiver receives (00111110) Which of the following is true of the receiverA Thinks the packet is corrupted and

discards the packetB Thinks only the checksum is corrupted

and delivers the correct data to the application

C Can possibly conclude that nothing is wrong with the packet

D A and CTransport Layer 3-23

Transport Layer 3-24

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-25

TCP Overview RFCs 79311221323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size connection-oriented

handshaking (exchange of control msgs) inits sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver reliable in-order

byte steam no ldquomessage

boundariesrdquo pipelined

TCP congestion and flow control set window size

Transport Layer 3-26

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement number

receive window

Urg data pointerchecksum

FSRPAUheadlen

notused

options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 3-27

TCP seq numbers ACKssequence numbers

byte stream ldquonumberrdquo of first byte in segmentrsquos data

acknowledgementsseq of next byte expected from other side

cumulative ACKQ how receiver handles out-of-order segmentsA TCP spec doesnrsquot say - up to implementor

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

incoming segment to sender

A

sent ACKed

sent not-yet ACKed(ldquoin-flightrdquo)

usablebut not yet sent

not usable

window size N

sender sequence number space

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

outgoing segment from sender

Transport Layer 3-28

TCP seq numbers ACKs

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

simple telnet scenario

Host BHost A

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Transport Layer 3-29

TCP round trip time timeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short

premature timeout unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT

SampleRTT measured time from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several

recent measurements not just current SampleRTT

Transport Layer 3-30

RTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

EstimatedRTT = (1- )EstimatedRTT + SampleRTT exponential weighted moving average influence of past sample decreases

exponentially fast typical value = 0125

TCP round trip time timeout

RTT

(mill

iseco

nds)

RTT gaiacsumassedu to fantasiaeurecomfr

sampleRTT

EstimatedRTT

time (seconds)

Transport Layer 3-31

timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin

estimate SampleRTT deviation from EstimatedRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

TCP round trip time timeout

(typically = 025)

TimeoutInterval = EstimatedRTT + 4DevRTT

estimated RTT ldquosafety marginrdquo

Transport Layer 3-32

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-33

TCP reliable data transfer TCP creates rdt

service on top of IPrsquos unreliable service pipelined segments cumulative acks

bull selective acks often supported as an option

single retransmission timer

retransmissions triggered by timeout events duplicate acks

letrsquos initially consider simplified TCP sender ignore duplicate acks ignore flow control

congestion control

Transport Layer 3-34

TCP sender eventsdata rcvd from app create segment

with seq (= byte-stream number of first data byte in segment)

start timer if not already running (for oldest unacked segment) TimeOutInterval =

smoothed_RTT + 4deviation_RTT

timeout retransmit segment

that caused timeout restart timer ack rcvd if ack acknowledges

previously unacked segments update what is

known to be ACKed (re-)start timer if still

unacked segments

Transport Layer 3-35

TCP sender (simplified)

waitfor

event

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

create segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running) start timer

data received from application above

retransmit not-yet-acked segment with smallest seq start timer

timeout

if (y gt SendBase) SendBase = y SendBasendash1 last cumulatively ACKed byte if (there are currently not-yet-acked segments) (re-)start timer else stop timer

ACK received with ACK field value y

Transport Layer 3-36

TCP retransmission scenarios

lost ACK scenario

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8 bytes of data

Xtim

eo

ut

ACK=100

premature timeout

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8bytes of data

tim

eo

ut

ACK=120

Seq=100 20 bytes of data

ACK=120

SendBase=100

SendBase=120

SendBase=120

SendBase=92

Transport Layer 3-37

TCP retransmission scenarios

X

cumulative ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=120 15 bytes of data

tim

eo

ut

Seq=100 20 bytes of data

ACK=120

Transport Layer 3-38

TCP ACK generation [RFC 1122 RFC

2581]

event at receiver

arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

arrival of in-order segment withexpected seq One other segment has ACK pending

arrival of out-of-order segmenthigher-than-expect seq Gap detected

arrival of segment that partially or completely fills gap

TCP receiver action

delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

immediately send single cumulative ACK ACKing both in-order segments

immediately send duplicate ACK indicating seq of next expected byte

immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-39

TCP fast retransmit time-out period

often relatively long long delay before

resending lost packet detect lost

segments via duplicate ACKs sender often sends

many segments back-to-back

if segment is lost there will likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unacked segment with smallest seq

likely that unacked segment lost so donrsquot wait for timeout

TCP fast retransmit

(ldquotriple duplicate ACKsrdquo)

Transport Layer 3-40

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

tim

eo

ut ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer 3-41

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-42

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

application may remove data from

TCP socket buffers hellip

hellip slower than TCP

receiver is delivering(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-43

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size can be

set via socket options most operating systems

auto-adjust RcvBuffer sender limits amount

of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value to ensure receive buffer will not overflow

receiver-side buffering

Transport Layer 3-44

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-45

Connection Managementbefore exchanging data senderreceiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-46

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-47

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-48

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-49

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-50

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-51

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

TCP Overall state machine

Transport Layer 3-52

Transport Layer 3-53

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-54

congestion informally ldquotoo many sources sending

too much data too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Principles of congestion control

Transport Layer 3-55

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

output link capacity R

no retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

Transport Layer 3-56

one router finite buffers sender retransmission of timed-out packet

app-layer input = app-layer outputin = out

transport-layer input includes retransmissions rsquoin

ge in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

Causescosts of congestion scenario 2

Transport Layer 3-57

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-58

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-59

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-60

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-61

R2

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-62

four senders multihop paths timeoutretransmit

Q what happens as in and rsquo

in increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host CHost D

in original data

in original data plus

retransmitted data

A as red rsquoin increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-63

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

bandwidth used for that packet wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 2: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-2

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-3

Transport services and protocols

provide logical communication between app processes running on different hosts

transport protocols run in end systems send side breaks app

messages into segments passes to network layer

recv side reassembles segments into messages passes to app layer

more than one transport protocol available to apps Internet TCP and UDP

application

transportnetworkdata linkphysical

logical end-end transport

application

transportnetworkdata linkphysical

Transport Layer 3-4

Transport vs network layer network layer

logical communication between hosts

transport layer logical communication between processes relies on and

enhances network layer services

12 kids in Annrsquos house sending letters to 12 kids in Billrsquos house

hosts = houses processes = kids app messages =

letters in envelopes transport protocol =

Ann and Bill who demux to in-house siblings

network-layer protocol = postal service

household analogy

Transport Layer 3-5

Internet transport-layer protocols

reliable in-order delivery (TCP) congestion control flow control connection setup

unreliable unordered delivery UDP no-frills extension of

ldquobest-effortrdquo IP services not

available delay guarantees bandwidth

guarantees

application

transportnetworkdata linkphysical

application

transportnetworkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

logical end-end transport

Transport Layer 3-6

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-7

Multiplexingdemultiplexing

process

socket

use header info to deliverreceived segments to correct socket

demultiplexing at receiverhandle data from multiplesockets add transport header (later used for demultiplexing)

multiplexing at sender

transport

application

physical

link

network

P2P1

transport

application

physical

link

network

P4

transport

application

physical

link

network

P3

Transport Layer 3-8

How demultiplexing works

host receives IP datagrams each datagram has source and

destination IP address each datagram carries one

transport-layer segment each segment has source and

destination port number host uses IP addresses amp port

numbers to direct segment to right socket

source port dest port

32 bits

applicationdata

(payload)

other header fields

TCPUDP segment format

Transport Layer 3-9

Connectionless demultiplexing

recall created socket has host-local port

DatagramSocket mySocket1 = new DatagramSocket(12534)

when host receives UDP segment checks destination IP

and port in segment directs UDP segment

to socket bound to that (IPport)

recall when creating datagram to send into UDP socket must specify

destination IP address destination port IP datagrams with same dest (IP port) but different source IP addresses andor source port numbers will be directed to same socket

Transport Layer 3-10

Connectionless demux example

DatagramSocket serverSocket = new DatagramSocket

(6428)

transport

application

physical

link

network

P3transport

application

physical

link

network

P1

transport

application

physical

link

network

P4

DatagramSocket mySocket1 = new DatagramSocket (5775)

DatagramSocket mySocket2 = new DatagramSocket (9157)

source port 9157dest port 6428

source port 6428dest port 9157

source port dest port

source port dest port

Transport Layer 3-11

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

demux receiver uses all four values to direct segment to right socket

server host has many simultaneous TCP sockets each socket identified

by its own 4-tuple web servers have

different socket each client non-persistent HTTP

will have different socket for each request

Transport Layer 3-12

Connection-oriented demux example

transport

application

physical

link

network

P3transport

app

physical

link

P4

transport

application

physical

link

network

P2

source IPport A9157dest IP port B80

source IPport B80dest IPport A9157

host IP address

A

host IP address

C

network

P6P5P3

source IPport C5775dest IPport B80

source IPport C9157dest IPport B80

three segments all destined to IP address B dest port 80 are demultiplexed to different sockets

server IP

address B

server socket also port 80

Transport Layer 3-13

Connection-oriented demux example

transport

application

physical

link

network

P3transport

app

physical

link

transport

application

physical

link

network

P2

source IPport A9157dest IP port B80

source IPport B80dest IPport A9157

host IP address

A

host IP address

C

server IP

address B

network

P3

source IPport C5775dest IPport B80

source IPport C9157dest IPport B80

P4

server socket also port 80threaded server

Transport Layer 3-14

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-15

UDP User Datagram Protocol [RFC 768] no frills bare bones

transport protocol for ldquobest effortrdquo service UDP segments may be lost delivered out-of-

order connectionless

no sender-receiver handshaking

each UDP segment handled independently

UDP uses streaming

multimedia apps (loss tolerant rate sensitive)

DNS SNMP

reliable transfer over UDP add reliability at

application layer application-specific

error recovery

Transport Layer 3-16

UDP segment header

source port dest port

32 bits

applicationdata

(payload)

UDP segment format

length checksum

length in bytes of UDP segment

including header

no connection establishment (which can add delay)

simple no connection state at sender receiver

small header size no congestion control

UDP can blast away as fast as desired

why is there a UDP

Transport Layer 3-17

UDP checksum

sender treat segment contents

including header fields as sequence of 16-bit integers

checksum addition (onersquos complement sum) of segment contents

sender puts checksum value into UDP checksum field

receiver compute checksum of

received segment check if computed

checksum equals checksum field value NO - error detected YES - no error detected

But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (flipped bits) in segments

Transport Layer 3-18

Internet checksum exampleexample add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

Note when adding numbers a carryout from the most significant bit needs to be added to the

result

Q1 Sockets and multiplexing TCP uses more information in packet

headers in order to demultiplex packets compared to UDPA TrueB False

Transport Layer 3-19

Q2 Sockets UDP Suppose we use UDP instead of TCP under

HTTP for designing a web server where all requests and responses fit in a single packet Suppose a 100 clients are simultaneously communicating with this web server How many sockets are respectively at the server and at each clientA 11B 21C 2002D 1001E 101 1

Transport Layer 3-20

Q3 Sockets TCP

Suppose a 100 clients are simultaneously communicating with (a traditional HTTPTCP) web server How many sockets are respectively at the server and at each clientA 11B 21C 2002D 1001E 101 1

Transport Layer 3-21

Q4 Sockets TCP

Suppose a 100 clients are simultaneously communicating with (a traditional HTTPTCP) web server Do all of the sockets at the server have the same server-side port numberA YesB No

Transport Layer 3-22

Q5 UDP checksums Letrsquos denote a UDP packet as (checksum

data) ignoring other fields for this question Suppose a sender sends (0010 1110) and the receiver receives (00111110) Which of the following is true of the receiverA Thinks the packet is corrupted and

discards the packetB Thinks only the checksum is corrupted

and delivers the correct data to the application

C Can possibly conclude that nothing is wrong with the packet

D A and CTransport Layer 3-23

Transport Layer 3-24

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-25

TCP Overview RFCs 79311221323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size connection-oriented

handshaking (exchange of control msgs) inits sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver reliable in-order

byte steam no ldquomessage

boundariesrdquo pipelined

TCP congestion and flow control set window size

Transport Layer 3-26

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement number

receive window

Urg data pointerchecksum

FSRPAUheadlen

notused

options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 3-27

TCP seq numbers ACKssequence numbers

byte stream ldquonumberrdquo of first byte in segmentrsquos data

acknowledgementsseq of next byte expected from other side

cumulative ACKQ how receiver handles out-of-order segmentsA TCP spec doesnrsquot say - up to implementor

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

incoming segment to sender

A

sent ACKed

sent not-yet ACKed(ldquoin-flightrdquo)

usablebut not yet sent

not usable

window size N

sender sequence number space

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

outgoing segment from sender

Transport Layer 3-28

TCP seq numbers ACKs

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

simple telnet scenario

Host BHost A

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Transport Layer 3-29

TCP round trip time timeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short

premature timeout unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT

SampleRTT measured time from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several

recent measurements not just current SampleRTT

Transport Layer 3-30

RTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

EstimatedRTT = (1- )EstimatedRTT + SampleRTT exponential weighted moving average influence of past sample decreases

exponentially fast typical value = 0125

TCP round trip time timeout

RTT

(mill

iseco

nds)

RTT gaiacsumassedu to fantasiaeurecomfr

sampleRTT

EstimatedRTT

time (seconds)

Transport Layer 3-31

timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin

estimate SampleRTT deviation from EstimatedRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

TCP round trip time timeout

(typically = 025)

TimeoutInterval = EstimatedRTT + 4DevRTT

estimated RTT ldquosafety marginrdquo

Transport Layer 3-32

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-33

TCP reliable data transfer TCP creates rdt

service on top of IPrsquos unreliable service pipelined segments cumulative acks

bull selective acks often supported as an option

single retransmission timer

retransmissions triggered by timeout events duplicate acks

letrsquos initially consider simplified TCP sender ignore duplicate acks ignore flow control

congestion control

Transport Layer 3-34

TCP sender eventsdata rcvd from app create segment

with seq (= byte-stream number of first data byte in segment)

start timer if not already running (for oldest unacked segment) TimeOutInterval =

smoothed_RTT + 4deviation_RTT

timeout retransmit segment

that caused timeout restart timer ack rcvd if ack acknowledges

previously unacked segments update what is

known to be ACKed (re-)start timer if still

unacked segments

Transport Layer 3-35

TCP sender (simplified)

waitfor

event

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

create segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running) start timer

data received from application above

retransmit not-yet-acked segment with smallest seq start timer

timeout

if (y gt SendBase) SendBase = y SendBasendash1 last cumulatively ACKed byte if (there are currently not-yet-acked segments) (re-)start timer else stop timer

ACK received with ACK field value y

Transport Layer 3-36

TCP retransmission scenarios

lost ACK scenario

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8 bytes of data

Xtim

eo

ut

ACK=100

premature timeout

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8bytes of data

tim

eo

ut

ACK=120

Seq=100 20 bytes of data

ACK=120

SendBase=100

SendBase=120

SendBase=120

SendBase=92

Transport Layer 3-37

TCP retransmission scenarios

X

cumulative ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=120 15 bytes of data

tim

eo

ut

Seq=100 20 bytes of data

ACK=120

Transport Layer 3-38

TCP ACK generation [RFC 1122 RFC

2581]

event at receiver

arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

arrival of in-order segment withexpected seq One other segment has ACK pending

arrival of out-of-order segmenthigher-than-expect seq Gap detected

arrival of segment that partially or completely fills gap

TCP receiver action

delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

immediately send single cumulative ACK ACKing both in-order segments

immediately send duplicate ACK indicating seq of next expected byte

immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-39

TCP fast retransmit time-out period

often relatively long long delay before

resending lost packet detect lost

segments via duplicate ACKs sender often sends

many segments back-to-back

if segment is lost there will likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unacked segment with smallest seq

likely that unacked segment lost so donrsquot wait for timeout

TCP fast retransmit

(ldquotriple duplicate ACKsrdquo)

Transport Layer 3-40

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

tim

eo

ut ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer 3-41

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-42

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

application may remove data from

TCP socket buffers hellip

hellip slower than TCP

receiver is delivering(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-43

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size can be

set via socket options most operating systems

auto-adjust RcvBuffer sender limits amount

of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value to ensure receive buffer will not overflow

receiver-side buffering

Transport Layer 3-44

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-45

Connection Managementbefore exchanging data senderreceiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-46

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-47

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-48

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-49

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-50

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-51

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

TCP Overall state machine

Transport Layer 3-52

Transport Layer 3-53

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-54

congestion informally ldquotoo many sources sending

too much data too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Principles of congestion control

Transport Layer 3-55

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

output link capacity R

no retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

Transport Layer 3-56

one router finite buffers sender retransmission of timed-out packet

app-layer input = app-layer outputin = out

transport-layer input includes retransmissions rsquoin

ge in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

Causescosts of congestion scenario 2

Transport Layer 3-57

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-58

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-59

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-60

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-61

R2

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-62

four senders multihop paths timeoutretransmit

Q what happens as in and rsquo

in increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host CHost D

in original data

in original data plus

retransmitted data

A as red rsquoin increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-63

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

bandwidth used for that packet wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 3: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-3

Transport services and protocols

provide logical communication between app processes running on different hosts

transport protocols run in end systems send side breaks app

messages into segments passes to network layer

recv side reassembles segments into messages passes to app layer

more than one transport protocol available to apps Internet TCP and UDP

application

transportnetworkdata linkphysical

logical end-end transport

application

transportnetworkdata linkphysical

Transport Layer 3-4

Transport vs network layer network layer

logical communication between hosts

transport layer logical communication between processes relies on and

enhances network layer services

12 kids in Annrsquos house sending letters to 12 kids in Billrsquos house

hosts = houses processes = kids app messages =

letters in envelopes transport protocol =

Ann and Bill who demux to in-house siblings

network-layer protocol = postal service

household analogy

Transport Layer 3-5

Internet transport-layer protocols

reliable in-order delivery (TCP) congestion control flow control connection setup

unreliable unordered delivery UDP no-frills extension of

ldquobest-effortrdquo IP services not

available delay guarantees bandwidth

guarantees

application

transportnetworkdata linkphysical

application

transportnetworkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

logical end-end transport

Transport Layer 3-6

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-7

Multiplexingdemultiplexing

process

socket

use header info to deliverreceived segments to correct socket

demultiplexing at receiverhandle data from multiplesockets add transport header (later used for demultiplexing)

multiplexing at sender

transport

application

physical

link

network

P2P1

transport

application

physical

link

network

P4

transport

application

physical

link

network

P3

Transport Layer 3-8

How demultiplexing works

host receives IP datagrams each datagram has source and

destination IP address each datagram carries one

transport-layer segment each segment has source and

destination port number host uses IP addresses amp port

numbers to direct segment to right socket

source port dest port

32 bits

applicationdata

(payload)

other header fields

TCPUDP segment format

Transport Layer 3-9

Connectionless demultiplexing

recall created socket has host-local port

DatagramSocket mySocket1 = new DatagramSocket(12534)

when host receives UDP segment checks destination IP

and port in segment directs UDP segment

to socket bound to that (IPport)

recall when creating datagram to send into UDP socket must specify

destination IP address destination port IP datagrams with same dest (IP port) but different source IP addresses andor source port numbers will be directed to same socket

Transport Layer 3-10

Connectionless demux example

DatagramSocket serverSocket = new DatagramSocket

(6428)

transport

application

physical

link

network

P3transport

application

physical

link

network

P1

transport

application

physical

link

network

P4

DatagramSocket mySocket1 = new DatagramSocket (5775)

DatagramSocket mySocket2 = new DatagramSocket (9157)

source port 9157dest port 6428

source port 6428dest port 9157

source port dest port

source port dest port

Transport Layer 3-11

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

demux receiver uses all four values to direct segment to right socket

server host has many simultaneous TCP sockets each socket identified

by its own 4-tuple web servers have

different socket each client non-persistent HTTP

will have different socket for each request

Transport Layer 3-12

Connection-oriented demux example

transport

application

physical

link

network

P3transport

app

physical

link

P4

transport

application

physical

link

network

P2

source IPport A9157dest IP port B80

source IPport B80dest IPport A9157

host IP address

A

host IP address

C

network

P6P5P3

source IPport C5775dest IPport B80

source IPport C9157dest IPport B80

three segments all destined to IP address B dest port 80 are demultiplexed to different sockets

server IP

address B

server socket also port 80

Transport Layer 3-13

Connection-oriented demux example

transport

application

physical

link

network

P3transport

app

physical

link

transport

application

physical

link

network

P2

source IPport A9157dest IP port B80

source IPport B80dest IPport A9157

host IP address

A

host IP address

C

server IP

address B

network

P3

source IPport C5775dest IPport B80

source IPport C9157dest IPport B80

P4

server socket also port 80threaded server

Transport Layer 3-14

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-15

UDP User Datagram Protocol [RFC 768] no frills bare bones

transport protocol for ldquobest effortrdquo service UDP segments may be lost delivered out-of-

order connectionless

no sender-receiver handshaking

each UDP segment handled independently

UDP uses streaming

multimedia apps (loss tolerant rate sensitive)

DNS SNMP

reliable transfer over UDP add reliability at

application layer application-specific

error recovery

Transport Layer 3-16

UDP segment header

source port dest port

32 bits

applicationdata

(payload)

UDP segment format

length checksum

length in bytes of UDP segment

including header

no connection establishment (which can add delay)

simple no connection state at sender receiver

small header size no congestion control

UDP can blast away as fast as desired

why is there a UDP

Transport Layer 3-17

UDP checksum

sender treat segment contents

including header fields as sequence of 16-bit integers

checksum addition (onersquos complement sum) of segment contents

sender puts checksum value into UDP checksum field

receiver compute checksum of

received segment check if computed

checksum equals checksum field value NO - error detected YES - no error detected

But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (flipped bits) in segments

Transport Layer 3-18

Internet checksum exampleexample add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

Note when adding numbers a carryout from the most significant bit needs to be added to the

result

Q1 Sockets and multiplexing TCP uses more information in packet

headers in order to demultiplex packets compared to UDPA TrueB False

Transport Layer 3-19

Q2 Sockets UDP Suppose we use UDP instead of TCP under

HTTP for designing a web server where all requests and responses fit in a single packet Suppose a 100 clients are simultaneously communicating with this web server How many sockets are respectively at the server and at each clientA 11B 21C 2002D 1001E 101 1

Transport Layer 3-20

Q3 Sockets TCP

Suppose a 100 clients are simultaneously communicating with (a traditional HTTPTCP) web server How many sockets are respectively at the server and at each clientA 11B 21C 2002D 1001E 101 1

Transport Layer 3-21

Q4 Sockets TCP

Suppose a 100 clients are simultaneously communicating with (a traditional HTTPTCP) web server Do all of the sockets at the server have the same server-side port numberA YesB No

Transport Layer 3-22

Q5 UDP checksums Letrsquos denote a UDP packet as (checksum

data) ignoring other fields for this question Suppose a sender sends (0010 1110) and the receiver receives (00111110) Which of the following is true of the receiverA Thinks the packet is corrupted and

discards the packetB Thinks only the checksum is corrupted

and delivers the correct data to the application

C Can possibly conclude that nothing is wrong with the packet

D A and CTransport Layer 3-23

Transport Layer 3-24

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-25

TCP Overview RFCs 79311221323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size connection-oriented

handshaking (exchange of control msgs) inits sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver reliable in-order

byte steam no ldquomessage

boundariesrdquo pipelined

TCP congestion and flow control set window size

Transport Layer 3-26

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement number

receive window

Urg data pointerchecksum

FSRPAUheadlen

notused

options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 3-27

TCP seq numbers ACKssequence numbers

byte stream ldquonumberrdquo of first byte in segmentrsquos data

acknowledgementsseq of next byte expected from other side

cumulative ACKQ how receiver handles out-of-order segmentsA TCP spec doesnrsquot say - up to implementor

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

incoming segment to sender

A

sent ACKed

sent not-yet ACKed(ldquoin-flightrdquo)

usablebut not yet sent

not usable

window size N

sender sequence number space

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

outgoing segment from sender

Transport Layer 3-28

TCP seq numbers ACKs

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

simple telnet scenario

Host BHost A

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Transport Layer 3-29

TCP round trip time timeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short

premature timeout unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT

SampleRTT measured time from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several

recent measurements not just current SampleRTT

Transport Layer 3-30

RTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

EstimatedRTT = (1- )EstimatedRTT + SampleRTT exponential weighted moving average influence of past sample decreases

exponentially fast typical value = 0125

TCP round trip time timeout

RTT

(mill

iseco

nds)

RTT gaiacsumassedu to fantasiaeurecomfr

sampleRTT

EstimatedRTT

time (seconds)

Transport Layer 3-31

timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin

estimate SampleRTT deviation from EstimatedRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

TCP round trip time timeout

(typically = 025)

TimeoutInterval = EstimatedRTT + 4DevRTT

estimated RTT ldquosafety marginrdquo

Transport Layer 3-32

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-33

TCP reliable data transfer TCP creates rdt

service on top of IPrsquos unreliable service pipelined segments cumulative acks

bull selective acks often supported as an option

single retransmission timer

retransmissions triggered by timeout events duplicate acks

letrsquos initially consider simplified TCP sender ignore duplicate acks ignore flow control

congestion control

Transport Layer 3-34

TCP sender eventsdata rcvd from app create segment

with seq (= byte-stream number of first data byte in segment)

start timer if not already running (for oldest unacked segment) TimeOutInterval =

smoothed_RTT + 4deviation_RTT

timeout retransmit segment

that caused timeout restart timer ack rcvd if ack acknowledges

previously unacked segments update what is

known to be ACKed (re-)start timer if still

unacked segments

Transport Layer 3-35

TCP sender (simplified)

waitfor

event

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

create segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running) start timer

data received from application above

retransmit not-yet-acked segment with smallest seq start timer

timeout

if (y gt SendBase) SendBase = y SendBasendash1 last cumulatively ACKed byte if (there are currently not-yet-acked segments) (re-)start timer else stop timer

ACK received with ACK field value y

Transport Layer 3-36

TCP retransmission scenarios

lost ACK scenario

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8 bytes of data

Xtim

eo

ut

ACK=100

premature timeout

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8bytes of data

tim

eo

ut

ACK=120

Seq=100 20 bytes of data

ACK=120

SendBase=100

SendBase=120

SendBase=120

SendBase=92

Transport Layer 3-37

TCP retransmission scenarios

X

cumulative ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=120 15 bytes of data

tim

eo

ut

Seq=100 20 bytes of data

ACK=120

Transport Layer 3-38

TCP ACK generation [RFC 1122 RFC

2581]

event at receiver

arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

arrival of in-order segment withexpected seq One other segment has ACK pending

arrival of out-of-order segmenthigher-than-expect seq Gap detected

arrival of segment that partially or completely fills gap

TCP receiver action

delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

immediately send single cumulative ACK ACKing both in-order segments

immediately send duplicate ACK indicating seq of next expected byte

immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-39

TCP fast retransmit time-out period

often relatively long long delay before

resending lost packet detect lost

segments via duplicate ACKs sender often sends

many segments back-to-back

if segment is lost there will likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unacked segment with smallest seq

likely that unacked segment lost so donrsquot wait for timeout

TCP fast retransmit

(ldquotriple duplicate ACKsrdquo)

Transport Layer 3-40

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

tim

eo

ut ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer 3-41

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-42

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

application may remove data from

TCP socket buffers hellip

hellip slower than TCP

receiver is delivering(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-43

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size can be

set via socket options most operating systems

auto-adjust RcvBuffer sender limits amount

of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value to ensure receive buffer will not overflow

receiver-side buffering

Transport Layer 3-44

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-45

Connection Managementbefore exchanging data senderreceiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-46

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-47

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-48

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-49

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-50

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-51

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

TCP Overall state machine

Transport Layer 3-52

Transport Layer 3-53

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-54

congestion informally ldquotoo many sources sending

too much data too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Principles of congestion control

Transport Layer 3-55

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

output link capacity R

no retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

Transport Layer 3-56

one router finite buffers sender retransmission of timed-out packet

app-layer input = app-layer outputin = out

transport-layer input includes retransmissions rsquoin

ge in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

Causescosts of congestion scenario 2

Transport Layer 3-57

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-58

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-59

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-60

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-61

R2

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-62

four senders multihop paths timeoutretransmit

Q what happens as in and rsquo

in increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host CHost D

in original data

in original data plus

retransmitted data

A as red rsquoin increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-63

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

bandwidth used for that packet wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 4: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-4

Transport vs network layer network layer

logical communication between hosts

transport layer logical communication between processes relies on and

enhances network layer services

12 kids in Annrsquos house sending letters to 12 kids in Billrsquos house

hosts = houses processes = kids app messages =

letters in envelopes transport protocol =

Ann and Bill who demux to in-house siblings

network-layer protocol = postal service

household analogy

Transport Layer 3-5

Internet transport-layer protocols

reliable in-order delivery (TCP) congestion control flow control connection setup

unreliable unordered delivery UDP no-frills extension of

ldquobest-effortrdquo IP services not

available delay guarantees bandwidth

guarantees

application

transportnetworkdata linkphysical

application

transportnetworkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

logical end-end transport

Transport Layer 3-6

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-7

Multiplexingdemultiplexing

process

socket

use header info to deliverreceived segments to correct socket

demultiplexing at receiverhandle data from multiplesockets add transport header (later used for demultiplexing)

multiplexing at sender

transport

application

physical

link

network

P2P1

transport

application

physical

link

network

P4

transport

application

physical

link

network

P3

Transport Layer 3-8

How demultiplexing works

host receives IP datagrams each datagram has source and

destination IP address each datagram carries one

transport-layer segment each segment has source and

destination port number host uses IP addresses amp port

numbers to direct segment to right socket

source port dest port

32 bits

applicationdata

(payload)

other header fields

TCPUDP segment format

Transport Layer 3-9

Connectionless demultiplexing

recall created socket has host-local port

DatagramSocket mySocket1 = new DatagramSocket(12534)

when host receives UDP segment checks destination IP

and port in segment directs UDP segment

to socket bound to that (IPport)

recall when creating datagram to send into UDP socket must specify

destination IP address destination port IP datagrams with same dest (IP port) but different source IP addresses andor source port numbers will be directed to same socket

Transport Layer 3-10

Connectionless demux example

DatagramSocket serverSocket = new DatagramSocket

(6428)

transport

application

physical

link

network

P3transport

application

physical

link

network

P1

transport

application

physical

link

network

P4

DatagramSocket mySocket1 = new DatagramSocket (5775)

DatagramSocket mySocket2 = new DatagramSocket (9157)

source port 9157dest port 6428

source port 6428dest port 9157

source port dest port

source port dest port

Transport Layer 3-11

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

demux receiver uses all four values to direct segment to right socket

server host has many simultaneous TCP sockets each socket identified

by its own 4-tuple web servers have

different socket each client non-persistent HTTP

will have different socket for each request

Transport Layer 3-12

Connection-oriented demux example

transport

application

physical

link

network

P3transport

app

physical

link

P4

transport

application

physical

link

network

P2

source IPport A9157dest IP port B80

source IPport B80dest IPport A9157

host IP address

A

host IP address

C

network

P6P5P3

source IPport C5775dest IPport B80

source IPport C9157dest IPport B80

three segments all destined to IP address B dest port 80 are demultiplexed to different sockets

server IP

address B

server socket also port 80

Transport Layer 3-13

Connection-oriented demux example

transport

application

physical

link

network

P3transport

app

physical

link

transport

application

physical

link

network

P2

source IPport A9157dest IP port B80

source IPport B80dest IPport A9157

host IP address

A

host IP address

C

server IP

address B

network

P3

source IPport C5775dest IPport B80

source IPport C9157dest IPport B80

P4

server socket also port 80threaded server

Transport Layer 3-14

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-15

UDP User Datagram Protocol [RFC 768] no frills bare bones

transport protocol for ldquobest effortrdquo service UDP segments may be lost delivered out-of-

order connectionless

no sender-receiver handshaking

each UDP segment handled independently

UDP uses streaming

multimedia apps (loss tolerant rate sensitive)

DNS SNMP

reliable transfer over UDP add reliability at

application layer application-specific

error recovery

Transport Layer 3-16

UDP segment header

source port dest port

32 bits

applicationdata

(payload)

UDP segment format

length checksum

length in bytes of UDP segment

including header

no connection establishment (which can add delay)

simple no connection state at sender receiver

small header size no congestion control

UDP can blast away as fast as desired

why is there a UDP

Transport Layer 3-17

UDP checksum

sender treat segment contents

including header fields as sequence of 16-bit integers

checksum addition (onersquos complement sum) of segment contents

sender puts checksum value into UDP checksum field

receiver compute checksum of

received segment check if computed

checksum equals checksum field value NO - error detected YES - no error detected

But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (flipped bits) in segments

Transport Layer 3-18

Internet checksum exampleexample add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

Note when adding numbers a carryout from the most significant bit needs to be added to the

result

Q1 Sockets and multiplexing TCP uses more information in packet

headers in order to demultiplex packets compared to UDPA TrueB False

Transport Layer 3-19

Q2 Sockets UDP Suppose we use UDP instead of TCP under

HTTP for designing a web server where all requests and responses fit in a single packet Suppose a 100 clients are simultaneously communicating with this web server How many sockets are respectively at the server and at each clientA 11B 21C 2002D 1001E 101 1

Transport Layer 3-20

Q3 Sockets TCP

Suppose a 100 clients are simultaneously communicating with (a traditional HTTPTCP) web server How many sockets are respectively at the server and at each clientA 11B 21C 2002D 1001E 101 1

Transport Layer 3-21

Q4 Sockets TCP

Suppose a 100 clients are simultaneously communicating with (a traditional HTTPTCP) web server Do all of the sockets at the server have the same server-side port numberA YesB No

Transport Layer 3-22

Q5 UDP checksums Letrsquos denote a UDP packet as (checksum

data) ignoring other fields for this question Suppose a sender sends (0010 1110) and the receiver receives (00111110) Which of the following is true of the receiverA Thinks the packet is corrupted and

discards the packetB Thinks only the checksum is corrupted

and delivers the correct data to the application

C Can possibly conclude that nothing is wrong with the packet

D A and CTransport Layer 3-23

Transport Layer 3-24

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-25

TCP Overview RFCs 79311221323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size connection-oriented

handshaking (exchange of control msgs) inits sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver reliable in-order

byte steam no ldquomessage

boundariesrdquo pipelined

TCP congestion and flow control set window size

Transport Layer 3-26

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement number

receive window

Urg data pointerchecksum

FSRPAUheadlen

notused

options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 3-27

TCP seq numbers ACKssequence numbers

byte stream ldquonumberrdquo of first byte in segmentrsquos data

acknowledgementsseq of next byte expected from other side

cumulative ACKQ how receiver handles out-of-order segmentsA TCP spec doesnrsquot say - up to implementor

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

incoming segment to sender

A

sent ACKed

sent not-yet ACKed(ldquoin-flightrdquo)

usablebut not yet sent

not usable

window size N

sender sequence number space

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

outgoing segment from sender

Transport Layer 3-28

TCP seq numbers ACKs

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

simple telnet scenario

Host BHost A

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Transport Layer 3-29

TCP round trip time timeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short

premature timeout unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT

SampleRTT measured time from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several

recent measurements not just current SampleRTT

Transport Layer 3-30

RTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

EstimatedRTT = (1- )EstimatedRTT + SampleRTT exponential weighted moving average influence of past sample decreases

exponentially fast typical value = 0125

TCP round trip time timeout

RTT

(mill

iseco

nds)

RTT gaiacsumassedu to fantasiaeurecomfr

sampleRTT

EstimatedRTT

time (seconds)

Transport Layer 3-31

timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin

estimate SampleRTT deviation from EstimatedRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

TCP round trip time timeout

(typically = 025)

TimeoutInterval = EstimatedRTT + 4DevRTT

estimated RTT ldquosafety marginrdquo

Transport Layer 3-32

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-33

TCP reliable data transfer TCP creates rdt

service on top of IPrsquos unreliable service pipelined segments cumulative acks

bull selective acks often supported as an option

single retransmission timer

retransmissions triggered by timeout events duplicate acks

letrsquos initially consider simplified TCP sender ignore duplicate acks ignore flow control

congestion control

Transport Layer 3-34

TCP sender eventsdata rcvd from app create segment

with seq (= byte-stream number of first data byte in segment)

start timer if not already running (for oldest unacked segment) TimeOutInterval =

smoothed_RTT + 4deviation_RTT

timeout retransmit segment

that caused timeout restart timer ack rcvd if ack acknowledges

previously unacked segments update what is

known to be ACKed (re-)start timer if still

unacked segments

Transport Layer 3-35

TCP sender (simplified)

waitfor

event

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

create segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running) start timer

data received from application above

retransmit not-yet-acked segment with smallest seq start timer

timeout

if (y gt SendBase) SendBase = y SendBasendash1 last cumulatively ACKed byte if (there are currently not-yet-acked segments) (re-)start timer else stop timer

ACK received with ACK field value y

Transport Layer 3-36

TCP retransmission scenarios

lost ACK scenario

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8 bytes of data

Xtim

eo

ut

ACK=100

premature timeout

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8bytes of data

tim

eo

ut

ACK=120

Seq=100 20 bytes of data

ACK=120

SendBase=100

SendBase=120

SendBase=120

SendBase=92

Transport Layer 3-37

TCP retransmission scenarios

X

cumulative ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=120 15 bytes of data

tim

eo

ut

Seq=100 20 bytes of data

ACK=120

Transport Layer 3-38

TCP ACK generation [RFC 1122 RFC

2581]

event at receiver

arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

arrival of in-order segment withexpected seq One other segment has ACK pending

arrival of out-of-order segmenthigher-than-expect seq Gap detected

arrival of segment that partially or completely fills gap

TCP receiver action

delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

immediately send single cumulative ACK ACKing both in-order segments

immediately send duplicate ACK indicating seq of next expected byte

immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-39

TCP fast retransmit time-out period

often relatively long long delay before

resending lost packet detect lost

segments via duplicate ACKs sender often sends

many segments back-to-back

if segment is lost there will likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unacked segment with smallest seq

likely that unacked segment lost so donrsquot wait for timeout

TCP fast retransmit

(ldquotriple duplicate ACKsrdquo)

Transport Layer 3-40

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

tim

eo

ut ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer 3-41

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-42

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

application may remove data from

TCP socket buffers hellip

hellip slower than TCP

receiver is delivering(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-43

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size can be

set via socket options most operating systems

auto-adjust RcvBuffer sender limits amount

of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value to ensure receive buffer will not overflow

receiver-side buffering

Transport Layer 3-44

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-45

Connection Managementbefore exchanging data senderreceiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-46

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-47

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-48

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-49

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-50

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-51

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

TCP Overall state machine

Transport Layer 3-52

Transport Layer 3-53

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-54

congestion informally ldquotoo many sources sending

too much data too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Principles of congestion control

Transport Layer 3-55

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

output link capacity R

no retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

Transport Layer 3-56

one router finite buffers sender retransmission of timed-out packet

app-layer input = app-layer outputin = out

transport-layer input includes retransmissions rsquoin

ge in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

Causescosts of congestion scenario 2

Transport Layer 3-57

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-58

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-59

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-60

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-61

R2

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-62

four senders multihop paths timeoutretransmit

Q what happens as in and rsquo

in increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host CHost D

in original data

in original data plus

retransmitted data

A as red rsquoin increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-63

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

bandwidth used for that packet wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 5: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-5

Internet transport-layer protocols

reliable in-order delivery (TCP) congestion control flow control connection setup

unreliable unordered delivery UDP no-frills extension of

ldquobest-effortrdquo IP services not

available delay guarantees bandwidth

guarantees

application

transportnetworkdata linkphysical

application

transportnetworkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

logical end-end transport

Transport Layer 3-6

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-7

Multiplexingdemultiplexing

process

socket

use header info to deliverreceived segments to correct socket

demultiplexing at receiverhandle data from multiplesockets add transport header (later used for demultiplexing)

multiplexing at sender

transport

application

physical

link

network

P2P1

transport

application

physical

link

network

P4

transport

application

physical

link

network

P3

Transport Layer 3-8

How demultiplexing works

host receives IP datagrams each datagram has source and

destination IP address each datagram carries one

transport-layer segment each segment has source and

destination port number host uses IP addresses amp port

numbers to direct segment to right socket

source port dest port

32 bits

applicationdata

(payload)

other header fields

TCPUDP segment format

Transport Layer 3-9

Connectionless demultiplexing

recall created socket has host-local port

DatagramSocket mySocket1 = new DatagramSocket(12534)

when host receives UDP segment checks destination IP

and port in segment directs UDP segment

to socket bound to that (IPport)

recall when creating datagram to send into UDP socket must specify

destination IP address destination port IP datagrams with same dest (IP port) but different source IP addresses andor source port numbers will be directed to same socket

Transport Layer 3-10

Connectionless demux example

DatagramSocket serverSocket = new DatagramSocket

(6428)

transport

application

physical

link

network

P3transport

application

physical

link

network

P1

transport

application

physical

link

network

P4

DatagramSocket mySocket1 = new DatagramSocket (5775)

DatagramSocket mySocket2 = new DatagramSocket (9157)

source port 9157dest port 6428

source port 6428dest port 9157

source port dest port

source port dest port

Transport Layer 3-11

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

demux receiver uses all four values to direct segment to right socket

server host has many simultaneous TCP sockets each socket identified

by its own 4-tuple web servers have

different socket each client non-persistent HTTP

will have different socket for each request

Transport Layer 3-12

Connection-oriented demux example

transport

application

physical

link

network

P3transport

app

physical

link

P4

transport

application

physical

link

network

P2

source IPport A9157dest IP port B80

source IPport B80dest IPport A9157

host IP address

A

host IP address

C

network

P6P5P3

source IPport C5775dest IPport B80

source IPport C9157dest IPport B80

three segments all destined to IP address B dest port 80 are demultiplexed to different sockets

server IP

address B

server socket also port 80

Transport Layer 3-13

Connection-oriented demux example

transport

application

physical

link

network

P3transport

app

physical

link

transport

application

physical

link

network

P2

source IPport A9157dest IP port B80

source IPport B80dest IPport A9157

host IP address

A

host IP address

C

server IP

address B

network

P3

source IPport C5775dest IPport B80

source IPport C9157dest IPport B80

P4

server socket also port 80threaded server

Transport Layer 3-14

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-15

UDP User Datagram Protocol [RFC 768] no frills bare bones

transport protocol for ldquobest effortrdquo service UDP segments may be lost delivered out-of-

order connectionless

no sender-receiver handshaking

each UDP segment handled independently

UDP uses streaming

multimedia apps (loss tolerant rate sensitive)

DNS SNMP

reliable transfer over UDP add reliability at

application layer application-specific

error recovery

Transport Layer 3-16

UDP segment header

source port dest port

32 bits

applicationdata

(payload)

UDP segment format

length checksum

length in bytes of UDP segment

including header

no connection establishment (which can add delay)

simple no connection state at sender receiver

small header size no congestion control

UDP can blast away as fast as desired

why is there a UDP

Transport Layer 3-17

UDP checksum

sender treat segment contents

including header fields as sequence of 16-bit integers

checksum addition (onersquos complement sum) of segment contents

sender puts checksum value into UDP checksum field

receiver compute checksum of

received segment check if computed

checksum equals checksum field value NO - error detected YES - no error detected

But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (flipped bits) in segments

Transport Layer 3-18

Internet checksum exampleexample add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

Note when adding numbers a carryout from the most significant bit needs to be added to the

result

Q1 Sockets and multiplexing TCP uses more information in packet

headers in order to demultiplex packets compared to UDPA TrueB False

Transport Layer 3-19

Q2 Sockets UDP Suppose we use UDP instead of TCP under

HTTP for designing a web server where all requests and responses fit in a single packet Suppose a 100 clients are simultaneously communicating with this web server How many sockets are respectively at the server and at each clientA 11B 21C 2002D 1001E 101 1

Transport Layer 3-20

Q3 Sockets TCP

Suppose a 100 clients are simultaneously communicating with (a traditional HTTPTCP) web server How many sockets are respectively at the server and at each clientA 11B 21C 2002D 1001E 101 1

Transport Layer 3-21

Q4 Sockets TCP

Suppose a 100 clients are simultaneously communicating with (a traditional HTTPTCP) web server Do all of the sockets at the server have the same server-side port numberA YesB No

Transport Layer 3-22

Q5 UDP checksums Letrsquos denote a UDP packet as (checksum

data) ignoring other fields for this question Suppose a sender sends (0010 1110) and the receiver receives (00111110) Which of the following is true of the receiverA Thinks the packet is corrupted and

discards the packetB Thinks only the checksum is corrupted

and delivers the correct data to the application

C Can possibly conclude that nothing is wrong with the packet

D A and CTransport Layer 3-23

Transport Layer 3-24

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-25

TCP Overview RFCs 79311221323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size connection-oriented

handshaking (exchange of control msgs) inits sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver reliable in-order

byte steam no ldquomessage

boundariesrdquo pipelined

TCP congestion and flow control set window size

Transport Layer 3-26

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement number

receive window

Urg data pointerchecksum

FSRPAUheadlen

notused

options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 3-27

TCP seq numbers ACKssequence numbers

byte stream ldquonumberrdquo of first byte in segmentrsquos data

acknowledgementsseq of next byte expected from other side

cumulative ACKQ how receiver handles out-of-order segmentsA TCP spec doesnrsquot say - up to implementor

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

incoming segment to sender

A

sent ACKed

sent not-yet ACKed(ldquoin-flightrdquo)

usablebut not yet sent

not usable

window size N

sender sequence number space

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

outgoing segment from sender

Transport Layer 3-28

TCP seq numbers ACKs

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

simple telnet scenario

Host BHost A

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Transport Layer 3-29

TCP round trip time timeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short

premature timeout unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT

SampleRTT measured time from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several

recent measurements not just current SampleRTT

Transport Layer 3-30

RTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

EstimatedRTT = (1- )EstimatedRTT + SampleRTT exponential weighted moving average influence of past sample decreases

exponentially fast typical value = 0125

TCP round trip time timeout

RTT

(mill

iseco

nds)

RTT gaiacsumassedu to fantasiaeurecomfr

sampleRTT

EstimatedRTT

time (seconds)

Transport Layer 3-31

timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin

estimate SampleRTT deviation from EstimatedRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

TCP round trip time timeout

(typically = 025)

TimeoutInterval = EstimatedRTT + 4DevRTT

estimated RTT ldquosafety marginrdquo

Transport Layer 3-32

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-33

TCP reliable data transfer TCP creates rdt

service on top of IPrsquos unreliable service pipelined segments cumulative acks

bull selective acks often supported as an option

single retransmission timer

retransmissions triggered by timeout events duplicate acks

letrsquos initially consider simplified TCP sender ignore duplicate acks ignore flow control

congestion control

Transport Layer 3-34

TCP sender eventsdata rcvd from app create segment

with seq (= byte-stream number of first data byte in segment)

start timer if not already running (for oldest unacked segment) TimeOutInterval =

smoothed_RTT + 4deviation_RTT

timeout retransmit segment

that caused timeout restart timer ack rcvd if ack acknowledges

previously unacked segments update what is

known to be ACKed (re-)start timer if still

unacked segments

Transport Layer 3-35

TCP sender (simplified)

waitfor

event

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

create segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running) start timer

data received from application above

retransmit not-yet-acked segment with smallest seq start timer

timeout

if (y gt SendBase) SendBase = y SendBasendash1 last cumulatively ACKed byte if (there are currently not-yet-acked segments) (re-)start timer else stop timer

ACK received with ACK field value y

Transport Layer 3-36

TCP retransmission scenarios

lost ACK scenario

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8 bytes of data

Xtim

eo

ut

ACK=100

premature timeout

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8bytes of data

tim

eo

ut

ACK=120

Seq=100 20 bytes of data

ACK=120

SendBase=100

SendBase=120

SendBase=120

SendBase=92

Transport Layer 3-37

TCP retransmission scenarios

X

cumulative ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=120 15 bytes of data

tim

eo

ut

Seq=100 20 bytes of data

ACK=120

Transport Layer 3-38

TCP ACK generation [RFC 1122 RFC

2581]

event at receiver

arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

arrival of in-order segment withexpected seq One other segment has ACK pending

arrival of out-of-order segmenthigher-than-expect seq Gap detected

arrival of segment that partially or completely fills gap

TCP receiver action

delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

immediately send single cumulative ACK ACKing both in-order segments

immediately send duplicate ACK indicating seq of next expected byte

immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-39

TCP fast retransmit time-out period

often relatively long long delay before

resending lost packet detect lost

segments via duplicate ACKs sender often sends

many segments back-to-back

if segment is lost there will likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unacked segment with smallest seq

likely that unacked segment lost so donrsquot wait for timeout

TCP fast retransmit

(ldquotriple duplicate ACKsrdquo)

Transport Layer 3-40

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

tim

eo

ut ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer 3-41

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-42

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

application may remove data from

TCP socket buffers hellip

hellip slower than TCP

receiver is delivering(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-43

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size can be

set via socket options most operating systems

auto-adjust RcvBuffer sender limits amount

of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value to ensure receive buffer will not overflow

receiver-side buffering

Transport Layer 3-44

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-45

Connection Managementbefore exchanging data senderreceiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-46

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-47

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-48

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-49

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-50

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-51

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

TCP Overall state machine

Transport Layer 3-52

Transport Layer 3-53

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-54

congestion informally ldquotoo many sources sending

too much data too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Principles of congestion control

Transport Layer 3-55

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

output link capacity R

no retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

Transport Layer 3-56

one router finite buffers sender retransmission of timed-out packet

app-layer input = app-layer outputin = out

transport-layer input includes retransmissions rsquoin

ge in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

Causescosts of congestion scenario 2

Transport Layer 3-57

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-58

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-59

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-60

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-61

R2

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-62

four senders multihop paths timeoutretransmit

Q what happens as in and rsquo

in increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host CHost D

in original data

in original data plus

retransmitted data

A as red rsquoin increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-63

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

bandwidth used for that packet wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 6: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-6

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-7

Multiplexingdemultiplexing

process

socket

use header info to deliverreceived segments to correct socket

demultiplexing at receiverhandle data from multiplesockets add transport header (later used for demultiplexing)

multiplexing at sender

transport

application

physical

link

network

P2P1

transport

application

physical

link

network

P4

transport

application

physical

link

network

P3

Transport Layer 3-8

How demultiplexing works

host receives IP datagrams each datagram has source and

destination IP address each datagram carries one

transport-layer segment each segment has source and

destination port number host uses IP addresses amp port

numbers to direct segment to right socket

source port dest port

32 bits

applicationdata

(payload)

other header fields

TCPUDP segment format

Transport Layer 3-9

Connectionless demultiplexing

recall created socket has host-local port

DatagramSocket mySocket1 = new DatagramSocket(12534)

when host receives UDP segment checks destination IP

and port in segment directs UDP segment

to socket bound to that (IPport)

recall when creating datagram to send into UDP socket must specify

destination IP address destination port IP datagrams with same dest (IP port) but different source IP addresses andor source port numbers will be directed to same socket

Transport Layer 3-10

Connectionless demux example

DatagramSocket serverSocket = new DatagramSocket

(6428)

transport

application

physical

link

network

P3transport

application

physical

link

network

P1

transport

application

physical

link

network

P4

DatagramSocket mySocket1 = new DatagramSocket (5775)

DatagramSocket mySocket2 = new DatagramSocket (9157)

source port 9157dest port 6428

source port 6428dest port 9157

source port dest port

source port dest port

Transport Layer 3-11

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

demux receiver uses all four values to direct segment to right socket

server host has many simultaneous TCP sockets each socket identified

by its own 4-tuple web servers have

different socket each client non-persistent HTTP

will have different socket for each request

Transport Layer 3-12

Connection-oriented demux example

transport

application

physical

link

network

P3transport

app

physical

link

P4

transport

application

physical

link

network

P2

source IPport A9157dest IP port B80

source IPport B80dest IPport A9157

host IP address

A

host IP address

C

network

P6P5P3

source IPport C5775dest IPport B80

source IPport C9157dest IPport B80

three segments all destined to IP address B dest port 80 are demultiplexed to different sockets

server IP

address B

server socket also port 80

Transport Layer 3-13

Connection-oriented demux example

transport

application

physical

link

network

P3transport

app

physical

link

transport

application

physical

link

network

P2

source IPport A9157dest IP port B80

source IPport B80dest IPport A9157

host IP address

A

host IP address

C

server IP

address B

network

P3

source IPport C5775dest IPport B80

source IPport C9157dest IPport B80

P4

server socket also port 80threaded server

Transport Layer 3-14

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-15

UDP User Datagram Protocol [RFC 768] no frills bare bones

transport protocol for ldquobest effortrdquo service UDP segments may be lost delivered out-of-

order connectionless

no sender-receiver handshaking

each UDP segment handled independently

UDP uses streaming

multimedia apps (loss tolerant rate sensitive)

DNS SNMP

reliable transfer over UDP add reliability at

application layer application-specific

error recovery

Transport Layer 3-16

UDP segment header

source port dest port

32 bits

applicationdata

(payload)

UDP segment format

length checksum

length in bytes of UDP segment

including header

no connection establishment (which can add delay)

simple no connection state at sender receiver

small header size no congestion control

UDP can blast away as fast as desired

why is there a UDP

Transport Layer 3-17

UDP checksum

sender treat segment contents

including header fields as sequence of 16-bit integers

checksum addition (onersquos complement sum) of segment contents

sender puts checksum value into UDP checksum field

receiver compute checksum of

received segment check if computed

checksum equals checksum field value NO - error detected YES - no error detected

But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (flipped bits) in segments

Transport Layer 3-18

Internet checksum exampleexample add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

Note when adding numbers a carryout from the most significant bit needs to be added to the

result

Q1 Sockets and multiplexing TCP uses more information in packet

headers in order to demultiplex packets compared to UDPA TrueB False

Transport Layer 3-19

Q2 Sockets UDP Suppose we use UDP instead of TCP under

HTTP for designing a web server where all requests and responses fit in a single packet Suppose a 100 clients are simultaneously communicating with this web server How many sockets are respectively at the server and at each clientA 11B 21C 2002D 1001E 101 1

Transport Layer 3-20

Q3 Sockets TCP

Suppose a 100 clients are simultaneously communicating with (a traditional HTTPTCP) web server How many sockets are respectively at the server and at each clientA 11B 21C 2002D 1001E 101 1

Transport Layer 3-21

Q4 Sockets TCP

Suppose a 100 clients are simultaneously communicating with (a traditional HTTPTCP) web server Do all of the sockets at the server have the same server-side port numberA YesB No

Transport Layer 3-22

Q5 UDP checksums Letrsquos denote a UDP packet as (checksum

data) ignoring other fields for this question Suppose a sender sends (0010 1110) and the receiver receives (00111110) Which of the following is true of the receiverA Thinks the packet is corrupted and

discards the packetB Thinks only the checksum is corrupted

and delivers the correct data to the application

C Can possibly conclude that nothing is wrong with the packet

D A and CTransport Layer 3-23

Transport Layer 3-24

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-25

TCP Overview RFCs 79311221323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size connection-oriented

handshaking (exchange of control msgs) inits sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver reliable in-order

byte steam no ldquomessage

boundariesrdquo pipelined

TCP congestion and flow control set window size

Transport Layer 3-26

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement number

receive window

Urg data pointerchecksum

FSRPAUheadlen

notused

options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 3-27

TCP seq numbers ACKssequence numbers

byte stream ldquonumberrdquo of first byte in segmentrsquos data

acknowledgementsseq of next byte expected from other side

cumulative ACKQ how receiver handles out-of-order segmentsA TCP spec doesnrsquot say - up to implementor

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

incoming segment to sender

A

sent ACKed

sent not-yet ACKed(ldquoin-flightrdquo)

usablebut not yet sent

not usable

window size N

sender sequence number space

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

outgoing segment from sender

Transport Layer 3-28

TCP seq numbers ACKs

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

simple telnet scenario

Host BHost A

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Transport Layer 3-29

TCP round trip time timeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short

premature timeout unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT

SampleRTT measured time from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several

recent measurements not just current SampleRTT

Transport Layer 3-30

RTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

EstimatedRTT = (1- )EstimatedRTT + SampleRTT exponential weighted moving average influence of past sample decreases

exponentially fast typical value = 0125

TCP round trip time timeout

RTT

(mill

iseco

nds)

RTT gaiacsumassedu to fantasiaeurecomfr

sampleRTT

EstimatedRTT

time (seconds)

Transport Layer 3-31

timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin

estimate SampleRTT deviation from EstimatedRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

TCP round trip time timeout

(typically = 025)

TimeoutInterval = EstimatedRTT + 4DevRTT

estimated RTT ldquosafety marginrdquo

Transport Layer 3-32

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-33

TCP reliable data transfer TCP creates rdt

service on top of IPrsquos unreliable service pipelined segments cumulative acks

bull selective acks often supported as an option

single retransmission timer

retransmissions triggered by timeout events duplicate acks

letrsquos initially consider simplified TCP sender ignore duplicate acks ignore flow control

congestion control

Transport Layer 3-34

TCP sender eventsdata rcvd from app create segment

with seq (= byte-stream number of first data byte in segment)

start timer if not already running (for oldest unacked segment) TimeOutInterval =

smoothed_RTT + 4deviation_RTT

timeout retransmit segment

that caused timeout restart timer ack rcvd if ack acknowledges

previously unacked segments update what is

known to be ACKed (re-)start timer if still

unacked segments

Transport Layer 3-35

TCP sender (simplified)

waitfor

event

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

create segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running) start timer

data received from application above

retransmit not-yet-acked segment with smallest seq start timer

timeout

if (y gt SendBase) SendBase = y SendBasendash1 last cumulatively ACKed byte if (there are currently not-yet-acked segments) (re-)start timer else stop timer

ACK received with ACK field value y

Transport Layer 3-36

TCP retransmission scenarios

lost ACK scenario

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8 bytes of data

Xtim

eo

ut

ACK=100

premature timeout

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8bytes of data

tim

eo

ut

ACK=120

Seq=100 20 bytes of data

ACK=120

SendBase=100

SendBase=120

SendBase=120

SendBase=92

Transport Layer 3-37

TCP retransmission scenarios

X

cumulative ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=120 15 bytes of data

tim

eo

ut

Seq=100 20 bytes of data

ACK=120

Transport Layer 3-38

TCP ACK generation [RFC 1122 RFC

2581]

event at receiver

arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

arrival of in-order segment withexpected seq One other segment has ACK pending

arrival of out-of-order segmenthigher-than-expect seq Gap detected

arrival of segment that partially or completely fills gap

TCP receiver action

delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

immediately send single cumulative ACK ACKing both in-order segments

immediately send duplicate ACK indicating seq of next expected byte

immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-39

TCP fast retransmit time-out period

often relatively long long delay before

resending lost packet detect lost

segments via duplicate ACKs sender often sends

many segments back-to-back

if segment is lost there will likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unacked segment with smallest seq

likely that unacked segment lost so donrsquot wait for timeout

TCP fast retransmit

(ldquotriple duplicate ACKsrdquo)

Transport Layer 3-40

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

tim

eo

ut ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer 3-41

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-42

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

application may remove data from

TCP socket buffers hellip

hellip slower than TCP

receiver is delivering(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-43

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size can be

set via socket options most operating systems

auto-adjust RcvBuffer sender limits amount

of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value to ensure receive buffer will not overflow

receiver-side buffering

Transport Layer 3-44

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-45

Connection Managementbefore exchanging data senderreceiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-46

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-47

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-48

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-49

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-50

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-51

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

TCP Overall state machine

Transport Layer 3-52

Transport Layer 3-53

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-54

congestion informally ldquotoo many sources sending

too much data too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Principles of congestion control

Transport Layer 3-55

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

output link capacity R

no retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

Transport Layer 3-56

one router finite buffers sender retransmission of timed-out packet

app-layer input = app-layer outputin = out

transport-layer input includes retransmissions rsquoin

ge in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

Causescosts of congestion scenario 2

Transport Layer 3-57

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-58

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-59

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-60

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-61

R2

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-62

four senders multihop paths timeoutretransmit

Q what happens as in and rsquo

in increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host CHost D

in original data

in original data plus

retransmitted data

A as red rsquoin increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-63

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

bandwidth used for that packet wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 7: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-7

Multiplexingdemultiplexing

process

socket

use header info to deliverreceived segments to correct socket

demultiplexing at receiverhandle data from multiplesockets add transport header (later used for demultiplexing)

multiplexing at sender

transport

application

physical

link

network

P2P1

transport

application

physical

link

network

P4

transport

application

physical

link

network

P3

Transport Layer 3-8

How demultiplexing works

host receives IP datagrams each datagram has source and

destination IP address each datagram carries one

transport-layer segment each segment has source and

destination port number host uses IP addresses amp port

numbers to direct segment to right socket

source port dest port

32 bits

applicationdata

(payload)

other header fields

TCPUDP segment format

Transport Layer 3-9

Connectionless demultiplexing

recall created socket has host-local port

DatagramSocket mySocket1 = new DatagramSocket(12534)

when host receives UDP segment checks destination IP

and port in segment directs UDP segment

to socket bound to that (IPport)

recall when creating datagram to send into UDP socket must specify

destination IP address destination port IP datagrams with same dest (IP port) but different source IP addresses andor source port numbers will be directed to same socket

Transport Layer 3-10

Connectionless demux example

DatagramSocket serverSocket = new DatagramSocket

(6428)

transport

application

physical

link

network

P3transport

application

physical

link

network

P1

transport

application

physical

link

network

P4

DatagramSocket mySocket1 = new DatagramSocket (5775)

DatagramSocket mySocket2 = new DatagramSocket (9157)

source port 9157dest port 6428

source port 6428dest port 9157

source port dest port

source port dest port

Transport Layer 3-11

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

demux receiver uses all four values to direct segment to right socket

server host has many simultaneous TCP sockets each socket identified

by its own 4-tuple web servers have

different socket each client non-persistent HTTP

will have different socket for each request

Transport Layer 3-12

Connection-oriented demux example

transport

application

physical

link

network

P3transport

app

physical

link

P4

transport

application

physical

link

network

P2

source IPport A9157dest IP port B80

source IPport B80dest IPport A9157

host IP address

A

host IP address

C

network

P6P5P3

source IPport C5775dest IPport B80

source IPport C9157dest IPport B80

three segments all destined to IP address B dest port 80 are demultiplexed to different sockets

server IP

address B

server socket also port 80

Transport Layer 3-13

Connection-oriented demux example

transport

application

physical

link

network

P3transport

app

physical

link

transport

application

physical

link

network

P2

source IPport A9157dest IP port B80

source IPport B80dest IPport A9157

host IP address

A

host IP address

C

server IP

address B

network

P3

source IPport C5775dest IPport B80

source IPport C9157dest IPport B80

P4

server socket also port 80threaded server

Transport Layer 3-14

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-15

UDP User Datagram Protocol [RFC 768] no frills bare bones

transport protocol for ldquobest effortrdquo service UDP segments may be lost delivered out-of-

order connectionless

no sender-receiver handshaking

each UDP segment handled independently

UDP uses streaming

multimedia apps (loss tolerant rate sensitive)

DNS SNMP

reliable transfer over UDP add reliability at

application layer application-specific

error recovery

Transport Layer 3-16

UDP segment header

source port dest port

32 bits

applicationdata

(payload)

UDP segment format

length checksum

length in bytes of UDP segment

including header

no connection establishment (which can add delay)

simple no connection state at sender receiver

small header size no congestion control

UDP can blast away as fast as desired

why is there a UDP

Transport Layer 3-17

UDP checksum

sender treat segment contents

including header fields as sequence of 16-bit integers

checksum addition (onersquos complement sum) of segment contents

sender puts checksum value into UDP checksum field

receiver compute checksum of

received segment check if computed

checksum equals checksum field value NO - error detected YES - no error detected

But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (flipped bits) in segments

Transport Layer 3-18

Internet checksum exampleexample add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

Note when adding numbers a carryout from the most significant bit needs to be added to the

result

Q1 Sockets and multiplexing TCP uses more information in packet

headers in order to demultiplex packets compared to UDPA TrueB False

Transport Layer 3-19

Q2 Sockets UDP Suppose we use UDP instead of TCP under

HTTP for designing a web server where all requests and responses fit in a single packet Suppose a 100 clients are simultaneously communicating with this web server How many sockets are respectively at the server and at each clientA 11B 21C 2002D 1001E 101 1

Transport Layer 3-20

Q3 Sockets TCP

Suppose a 100 clients are simultaneously communicating with (a traditional HTTPTCP) web server How many sockets are respectively at the server and at each clientA 11B 21C 2002D 1001E 101 1

Transport Layer 3-21

Q4 Sockets TCP

Suppose a 100 clients are simultaneously communicating with (a traditional HTTPTCP) web server Do all of the sockets at the server have the same server-side port numberA YesB No

Transport Layer 3-22

Q5 UDP checksums Letrsquos denote a UDP packet as (checksum

data) ignoring other fields for this question Suppose a sender sends (0010 1110) and the receiver receives (00111110) Which of the following is true of the receiverA Thinks the packet is corrupted and

discards the packetB Thinks only the checksum is corrupted

and delivers the correct data to the application

C Can possibly conclude that nothing is wrong with the packet

D A and CTransport Layer 3-23

Transport Layer 3-24

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-25

TCP Overview RFCs 79311221323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size connection-oriented

handshaking (exchange of control msgs) inits sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver reliable in-order

byte steam no ldquomessage

boundariesrdquo pipelined

TCP congestion and flow control set window size

Transport Layer 3-26

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement number

receive window

Urg data pointerchecksum

FSRPAUheadlen

notused

options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 3-27

TCP seq numbers ACKssequence numbers

byte stream ldquonumberrdquo of first byte in segmentrsquos data

acknowledgementsseq of next byte expected from other side

cumulative ACKQ how receiver handles out-of-order segmentsA TCP spec doesnrsquot say - up to implementor

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

incoming segment to sender

A

sent ACKed

sent not-yet ACKed(ldquoin-flightrdquo)

usablebut not yet sent

not usable

window size N

sender sequence number space

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

outgoing segment from sender

Transport Layer 3-28

TCP seq numbers ACKs

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

simple telnet scenario

Host BHost A

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Transport Layer 3-29

TCP round trip time timeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short

premature timeout unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT

SampleRTT measured time from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several

recent measurements not just current SampleRTT

Transport Layer 3-30

RTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

EstimatedRTT = (1- )EstimatedRTT + SampleRTT exponential weighted moving average influence of past sample decreases

exponentially fast typical value = 0125

TCP round trip time timeout

RTT

(mill

iseco

nds)

RTT gaiacsumassedu to fantasiaeurecomfr

sampleRTT

EstimatedRTT

time (seconds)

Transport Layer 3-31

timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin

estimate SampleRTT deviation from EstimatedRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

TCP round trip time timeout

(typically = 025)

TimeoutInterval = EstimatedRTT + 4DevRTT

estimated RTT ldquosafety marginrdquo

Transport Layer 3-32

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-33

TCP reliable data transfer TCP creates rdt

service on top of IPrsquos unreliable service pipelined segments cumulative acks

bull selective acks often supported as an option

single retransmission timer

retransmissions triggered by timeout events duplicate acks

letrsquos initially consider simplified TCP sender ignore duplicate acks ignore flow control

congestion control

Transport Layer 3-34

TCP sender eventsdata rcvd from app create segment

with seq (= byte-stream number of first data byte in segment)

start timer if not already running (for oldest unacked segment) TimeOutInterval =

smoothed_RTT + 4deviation_RTT

timeout retransmit segment

that caused timeout restart timer ack rcvd if ack acknowledges

previously unacked segments update what is

known to be ACKed (re-)start timer if still

unacked segments

Transport Layer 3-35

TCP sender (simplified)

waitfor

event

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

create segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running) start timer

data received from application above

retransmit not-yet-acked segment with smallest seq start timer

timeout

if (y gt SendBase) SendBase = y SendBasendash1 last cumulatively ACKed byte if (there are currently not-yet-acked segments) (re-)start timer else stop timer

ACK received with ACK field value y

Transport Layer 3-36

TCP retransmission scenarios

lost ACK scenario

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8 bytes of data

Xtim

eo

ut

ACK=100

premature timeout

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8bytes of data

tim

eo

ut

ACK=120

Seq=100 20 bytes of data

ACK=120

SendBase=100

SendBase=120

SendBase=120

SendBase=92

Transport Layer 3-37

TCP retransmission scenarios

X

cumulative ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=120 15 bytes of data

tim

eo

ut

Seq=100 20 bytes of data

ACK=120

Transport Layer 3-38

TCP ACK generation [RFC 1122 RFC

2581]

event at receiver

arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

arrival of in-order segment withexpected seq One other segment has ACK pending

arrival of out-of-order segmenthigher-than-expect seq Gap detected

arrival of segment that partially or completely fills gap

TCP receiver action

delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

immediately send single cumulative ACK ACKing both in-order segments

immediately send duplicate ACK indicating seq of next expected byte

immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-39

TCP fast retransmit time-out period

often relatively long long delay before

resending lost packet detect lost

segments via duplicate ACKs sender often sends

many segments back-to-back

if segment is lost there will likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unacked segment with smallest seq

likely that unacked segment lost so donrsquot wait for timeout

TCP fast retransmit

(ldquotriple duplicate ACKsrdquo)

Transport Layer 3-40

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

tim

eo

ut ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer 3-41

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-42

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

application may remove data from

TCP socket buffers hellip

hellip slower than TCP

receiver is delivering(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-43

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size can be

set via socket options most operating systems

auto-adjust RcvBuffer sender limits amount

of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value to ensure receive buffer will not overflow

receiver-side buffering

Transport Layer 3-44

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-45

Connection Managementbefore exchanging data senderreceiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-46

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-47

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-48

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-49

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-50

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-51

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

TCP Overall state machine

Transport Layer 3-52

Transport Layer 3-53

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-54

congestion informally ldquotoo many sources sending

too much data too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Principles of congestion control

Transport Layer 3-55

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

output link capacity R

no retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

Transport Layer 3-56

one router finite buffers sender retransmission of timed-out packet

app-layer input = app-layer outputin = out

transport-layer input includes retransmissions rsquoin

ge in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

Causescosts of congestion scenario 2

Transport Layer 3-57

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-58

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-59

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-60

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-61

R2

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-62

four senders multihop paths timeoutretransmit

Q what happens as in and rsquo

in increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host CHost D

in original data

in original data plus

retransmitted data

A as red rsquoin increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-63

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

bandwidth used for that packet wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 8: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-8

How demultiplexing works

host receives IP datagrams each datagram has source and

destination IP address each datagram carries one

transport-layer segment each segment has source and

destination port number host uses IP addresses amp port

numbers to direct segment to right socket

source port dest port

32 bits

applicationdata

(payload)

other header fields

TCPUDP segment format

Transport Layer 3-9

Connectionless demultiplexing

recall created socket has host-local port

DatagramSocket mySocket1 = new DatagramSocket(12534)

when host receives UDP segment checks destination IP

and port in segment directs UDP segment

to socket bound to that (IPport)

recall when creating datagram to send into UDP socket must specify

destination IP address destination port IP datagrams with same dest (IP port) but different source IP addresses andor source port numbers will be directed to same socket

Transport Layer 3-10

Connectionless demux example

DatagramSocket serverSocket = new DatagramSocket

(6428)

transport

application

physical

link

network

P3transport

application

physical

link

network

P1

transport

application

physical

link

network

P4

DatagramSocket mySocket1 = new DatagramSocket (5775)

DatagramSocket mySocket2 = new DatagramSocket (9157)

source port 9157dest port 6428

source port 6428dest port 9157

source port dest port

source port dest port

Transport Layer 3-11

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

demux receiver uses all four values to direct segment to right socket

server host has many simultaneous TCP sockets each socket identified

by its own 4-tuple web servers have

different socket each client non-persistent HTTP

will have different socket for each request

Transport Layer 3-12

Connection-oriented demux example

transport

application

physical

link

network

P3transport

app

physical

link

P4

transport

application

physical

link

network

P2

source IPport A9157dest IP port B80

source IPport B80dest IPport A9157

host IP address

A

host IP address

C

network

P6P5P3

source IPport C5775dest IPport B80

source IPport C9157dest IPport B80

three segments all destined to IP address B dest port 80 are demultiplexed to different sockets

server IP

address B

server socket also port 80

Transport Layer 3-13

Connection-oriented demux example

transport

application

physical

link

network

P3transport

app

physical

link

transport

application

physical

link

network

P2

source IPport A9157dest IP port B80

source IPport B80dest IPport A9157

host IP address

A

host IP address

C

server IP

address B

network

P3

source IPport C5775dest IPport B80

source IPport C9157dest IPport B80

P4

server socket also port 80threaded server

Transport Layer 3-14

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-15

UDP User Datagram Protocol [RFC 768] no frills bare bones

transport protocol for ldquobest effortrdquo service UDP segments may be lost delivered out-of-

order connectionless

no sender-receiver handshaking

each UDP segment handled independently

UDP uses streaming

multimedia apps (loss tolerant rate sensitive)

DNS SNMP

reliable transfer over UDP add reliability at

application layer application-specific

error recovery

Transport Layer 3-16

UDP segment header

source port dest port

32 bits

applicationdata

(payload)

UDP segment format

length checksum

length in bytes of UDP segment

including header

no connection establishment (which can add delay)

simple no connection state at sender receiver

small header size no congestion control

UDP can blast away as fast as desired

why is there a UDP

Transport Layer 3-17

UDP checksum

sender treat segment contents

including header fields as sequence of 16-bit integers

checksum addition (onersquos complement sum) of segment contents

sender puts checksum value into UDP checksum field

receiver compute checksum of

received segment check if computed

checksum equals checksum field value NO - error detected YES - no error detected

But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (flipped bits) in segments

Transport Layer 3-18

Internet checksum exampleexample add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

Note when adding numbers a carryout from the most significant bit needs to be added to the

result

Q1 Sockets and multiplexing TCP uses more information in packet

headers in order to demultiplex packets compared to UDPA TrueB False

Transport Layer 3-19

Q2 Sockets UDP Suppose we use UDP instead of TCP under

HTTP for designing a web server where all requests and responses fit in a single packet Suppose a 100 clients are simultaneously communicating with this web server How many sockets are respectively at the server and at each clientA 11B 21C 2002D 1001E 101 1

Transport Layer 3-20

Q3 Sockets TCP

Suppose a 100 clients are simultaneously communicating with (a traditional HTTPTCP) web server How many sockets are respectively at the server and at each clientA 11B 21C 2002D 1001E 101 1

Transport Layer 3-21

Q4 Sockets TCP

Suppose a 100 clients are simultaneously communicating with (a traditional HTTPTCP) web server Do all of the sockets at the server have the same server-side port numberA YesB No

Transport Layer 3-22

Q5 UDP checksums Letrsquos denote a UDP packet as (checksum

data) ignoring other fields for this question Suppose a sender sends (0010 1110) and the receiver receives (00111110) Which of the following is true of the receiverA Thinks the packet is corrupted and

discards the packetB Thinks only the checksum is corrupted

and delivers the correct data to the application

C Can possibly conclude that nothing is wrong with the packet

D A and CTransport Layer 3-23

Transport Layer 3-24

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-25

TCP Overview RFCs 79311221323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size connection-oriented

handshaking (exchange of control msgs) inits sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver reliable in-order

byte steam no ldquomessage

boundariesrdquo pipelined

TCP congestion and flow control set window size

Transport Layer 3-26

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement number

receive window

Urg data pointerchecksum

FSRPAUheadlen

notused

options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 3-27

TCP seq numbers ACKssequence numbers

byte stream ldquonumberrdquo of first byte in segmentrsquos data

acknowledgementsseq of next byte expected from other side

cumulative ACKQ how receiver handles out-of-order segmentsA TCP spec doesnrsquot say - up to implementor

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

incoming segment to sender

A

sent ACKed

sent not-yet ACKed(ldquoin-flightrdquo)

usablebut not yet sent

not usable

window size N

sender sequence number space

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

outgoing segment from sender

Transport Layer 3-28

TCP seq numbers ACKs

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

simple telnet scenario

Host BHost A

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Transport Layer 3-29

TCP round trip time timeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short

premature timeout unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT

SampleRTT measured time from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several

recent measurements not just current SampleRTT

Transport Layer 3-30

RTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

EstimatedRTT = (1- )EstimatedRTT + SampleRTT exponential weighted moving average influence of past sample decreases

exponentially fast typical value = 0125

TCP round trip time timeout

RTT

(mill

iseco

nds)

RTT gaiacsumassedu to fantasiaeurecomfr

sampleRTT

EstimatedRTT

time (seconds)

Transport Layer 3-31

timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin

estimate SampleRTT deviation from EstimatedRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

TCP round trip time timeout

(typically = 025)

TimeoutInterval = EstimatedRTT + 4DevRTT

estimated RTT ldquosafety marginrdquo

Transport Layer 3-32

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-33

TCP reliable data transfer TCP creates rdt

service on top of IPrsquos unreliable service pipelined segments cumulative acks

bull selective acks often supported as an option

single retransmission timer

retransmissions triggered by timeout events duplicate acks

letrsquos initially consider simplified TCP sender ignore duplicate acks ignore flow control

congestion control

Transport Layer 3-34

TCP sender eventsdata rcvd from app create segment

with seq (= byte-stream number of first data byte in segment)

start timer if not already running (for oldest unacked segment) TimeOutInterval =

smoothed_RTT + 4deviation_RTT

timeout retransmit segment

that caused timeout restart timer ack rcvd if ack acknowledges

previously unacked segments update what is

known to be ACKed (re-)start timer if still

unacked segments

Transport Layer 3-35

TCP sender (simplified)

waitfor

event

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

create segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running) start timer

data received from application above

retransmit not-yet-acked segment with smallest seq start timer

timeout

if (y gt SendBase) SendBase = y SendBasendash1 last cumulatively ACKed byte if (there are currently not-yet-acked segments) (re-)start timer else stop timer

ACK received with ACK field value y

Transport Layer 3-36

TCP retransmission scenarios

lost ACK scenario

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8 bytes of data

Xtim

eo

ut

ACK=100

premature timeout

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8bytes of data

tim

eo

ut

ACK=120

Seq=100 20 bytes of data

ACK=120

SendBase=100

SendBase=120

SendBase=120

SendBase=92

Transport Layer 3-37

TCP retransmission scenarios

X

cumulative ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=120 15 bytes of data

tim

eo

ut

Seq=100 20 bytes of data

ACK=120

Transport Layer 3-38

TCP ACK generation [RFC 1122 RFC

2581]

event at receiver

arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

arrival of in-order segment withexpected seq One other segment has ACK pending

arrival of out-of-order segmenthigher-than-expect seq Gap detected

arrival of segment that partially or completely fills gap

TCP receiver action

delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

immediately send single cumulative ACK ACKing both in-order segments

immediately send duplicate ACK indicating seq of next expected byte

immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-39

TCP fast retransmit time-out period

often relatively long long delay before

resending lost packet detect lost

segments via duplicate ACKs sender often sends

many segments back-to-back

if segment is lost there will likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unacked segment with smallest seq

likely that unacked segment lost so donrsquot wait for timeout

TCP fast retransmit

(ldquotriple duplicate ACKsrdquo)

Transport Layer 3-40

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

tim

eo

ut ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer 3-41

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-42

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

application may remove data from

TCP socket buffers hellip

hellip slower than TCP

receiver is delivering(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-43

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size can be

set via socket options most operating systems

auto-adjust RcvBuffer sender limits amount

of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value to ensure receive buffer will not overflow

receiver-side buffering

Transport Layer 3-44

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-45

Connection Managementbefore exchanging data senderreceiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-46

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-47

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-48

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-49

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-50

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-51

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

TCP Overall state machine

Transport Layer 3-52

Transport Layer 3-53

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-54

congestion informally ldquotoo many sources sending

too much data too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Principles of congestion control

Transport Layer 3-55

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

output link capacity R

no retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

Transport Layer 3-56

one router finite buffers sender retransmission of timed-out packet

app-layer input = app-layer outputin = out

transport-layer input includes retransmissions rsquoin

ge in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

Causescosts of congestion scenario 2

Transport Layer 3-57

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-58

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-59

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-60

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-61

R2

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-62

four senders multihop paths timeoutretransmit

Q what happens as in and rsquo

in increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host CHost D

in original data

in original data plus

retransmitted data

A as red rsquoin increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-63

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

bandwidth used for that packet wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 9: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-9

Connectionless demultiplexing

recall created socket has host-local port

DatagramSocket mySocket1 = new DatagramSocket(12534)

when host receives UDP segment checks destination IP

and port in segment directs UDP segment

to socket bound to that (IPport)

recall when creating datagram to send into UDP socket must specify

destination IP address destination port IP datagrams with same dest (IP port) but different source IP addresses andor source port numbers will be directed to same socket

Transport Layer 3-10

Connectionless demux example

DatagramSocket serverSocket = new DatagramSocket

(6428)

transport

application

physical

link

network

P3transport

application

physical

link

network

P1

transport

application

physical

link

network

P4

DatagramSocket mySocket1 = new DatagramSocket (5775)

DatagramSocket mySocket2 = new DatagramSocket (9157)

source port 9157dest port 6428

source port 6428dest port 9157

source port dest port

source port dest port

Transport Layer 3-11

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

demux receiver uses all four values to direct segment to right socket

server host has many simultaneous TCP sockets each socket identified

by its own 4-tuple web servers have

different socket each client non-persistent HTTP

will have different socket for each request

Transport Layer 3-12

Connection-oriented demux example

transport

application

physical

link

network

P3transport

app

physical

link

P4

transport

application

physical

link

network

P2

source IPport A9157dest IP port B80

source IPport B80dest IPport A9157

host IP address

A

host IP address

C

network

P6P5P3

source IPport C5775dest IPport B80

source IPport C9157dest IPport B80

three segments all destined to IP address B dest port 80 are demultiplexed to different sockets

server IP

address B

server socket also port 80

Transport Layer 3-13

Connection-oriented demux example

transport

application

physical

link

network

P3transport

app

physical

link

transport

application

physical

link

network

P2

source IPport A9157dest IP port B80

source IPport B80dest IPport A9157

host IP address

A

host IP address

C

server IP

address B

network

P3

source IPport C5775dest IPport B80

source IPport C9157dest IPport B80

P4

server socket also port 80threaded server

Transport Layer 3-14

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-15

UDP User Datagram Protocol [RFC 768] no frills bare bones

transport protocol for ldquobest effortrdquo service UDP segments may be lost delivered out-of-

order connectionless

no sender-receiver handshaking

each UDP segment handled independently

UDP uses streaming

multimedia apps (loss tolerant rate sensitive)

DNS SNMP

reliable transfer over UDP add reliability at

application layer application-specific

error recovery

Transport Layer 3-16

UDP segment header

source port dest port

32 bits

applicationdata

(payload)

UDP segment format

length checksum

length in bytes of UDP segment

including header

no connection establishment (which can add delay)

simple no connection state at sender receiver

small header size no congestion control

UDP can blast away as fast as desired

why is there a UDP

Transport Layer 3-17

UDP checksum

sender treat segment contents

including header fields as sequence of 16-bit integers

checksum addition (onersquos complement sum) of segment contents

sender puts checksum value into UDP checksum field

receiver compute checksum of

received segment check if computed

checksum equals checksum field value NO - error detected YES - no error detected

But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (flipped bits) in segments

Transport Layer 3-18

Internet checksum exampleexample add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

Note when adding numbers a carryout from the most significant bit needs to be added to the

result

Q1 Sockets and multiplexing TCP uses more information in packet

headers in order to demultiplex packets compared to UDPA TrueB False

Transport Layer 3-19

Q2 Sockets UDP Suppose we use UDP instead of TCP under

HTTP for designing a web server where all requests and responses fit in a single packet Suppose a 100 clients are simultaneously communicating with this web server How many sockets are respectively at the server and at each clientA 11B 21C 2002D 1001E 101 1

Transport Layer 3-20

Q3 Sockets TCP

Suppose a 100 clients are simultaneously communicating with (a traditional HTTPTCP) web server How many sockets are respectively at the server and at each clientA 11B 21C 2002D 1001E 101 1

Transport Layer 3-21

Q4 Sockets TCP

Suppose a 100 clients are simultaneously communicating with (a traditional HTTPTCP) web server Do all of the sockets at the server have the same server-side port numberA YesB No

Transport Layer 3-22

Q5 UDP checksums Letrsquos denote a UDP packet as (checksum

data) ignoring other fields for this question Suppose a sender sends (0010 1110) and the receiver receives (00111110) Which of the following is true of the receiverA Thinks the packet is corrupted and

discards the packetB Thinks only the checksum is corrupted

and delivers the correct data to the application

C Can possibly conclude that nothing is wrong with the packet

D A and CTransport Layer 3-23

Transport Layer 3-24

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-25

TCP Overview RFCs 79311221323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size connection-oriented

handshaking (exchange of control msgs) inits sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver reliable in-order

byte steam no ldquomessage

boundariesrdquo pipelined

TCP congestion and flow control set window size

Transport Layer 3-26

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement number

receive window

Urg data pointerchecksum

FSRPAUheadlen

notused

options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 3-27

TCP seq numbers ACKssequence numbers

byte stream ldquonumberrdquo of first byte in segmentrsquos data

acknowledgementsseq of next byte expected from other side

cumulative ACKQ how receiver handles out-of-order segmentsA TCP spec doesnrsquot say - up to implementor

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

incoming segment to sender

A

sent ACKed

sent not-yet ACKed(ldquoin-flightrdquo)

usablebut not yet sent

not usable

window size N

sender sequence number space

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

outgoing segment from sender

Transport Layer 3-28

TCP seq numbers ACKs

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

simple telnet scenario

Host BHost A

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Transport Layer 3-29

TCP round trip time timeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short

premature timeout unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT

SampleRTT measured time from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several

recent measurements not just current SampleRTT

Transport Layer 3-30

RTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

EstimatedRTT = (1- )EstimatedRTT + SampleRTT exponential weighted moving average influence of past sample decreases

exponentially fast typical value = 0125

TCP round trip time timeout

RTT

(mill

iseco

nds)

RTT gaiacsumassedu to fantasiaeurecomfr

sampleRTT

EstimatedRTT

time (seconds)

Transport Layer 3-31

timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin

estimate SampleRTT deviation from EstimatedRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

TCP round trip time timeout

(typically = 025)

TimeoutInterval = EstimatedRTT + 4DevRTT

estimated RTT ldquosafety marginrdquo

Transport Layer 3-32

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-33

TCP reliable data transfer TCP creates rdt

service on top of IPrsquos unreliable service pipelined segments cumulative acks

bull selective acks often supported as an option

single retransmission timer

retransmissions triggered by timeout events duplicate acks

letrsquos initially consider simplified TCP sender ignore duplicate acks ignore flow control

congestion control

Transport Layer 3-34

TCP sender eventsdata rcvd from app create segment

with seq (= byte-stream number of first data byte in segment)

start timer if not already running (for oldest unacked segment) TimeOutInterval =

smoothed_RTT + 4deviation_RTT

timeout retransmit segment

that caused timeout restart timer ack rcvd if ack acknowledges

previously unacked segments update what is

known to be ACKed (re-)start timer if still

unacked segments

Transport Layer 3-35

TCP sender (simplified)

waitfor

event

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

create segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running) start timer

data received from application above

retransmit not-yet-acked segment with smallest seq start timer

timeout

if (y gt SendBase) SendBase = y SendBasendash1 last cumulatively ACKed byte if (there are currently not-yet-acked segments) (re-)start timer else stop timer

ACK received with ACK field value y

Transport Layer 3-36

TCP retransmission scenarios

lost ACK scenario

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8 bytes of data

Xtim

eo

ut

ACK=100

premature timeout

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8bytes of data

tim

eo

ut

ACK=120

Seq=100 20 bytes of data

ACK=120

SendBase=100

SendBase=120

SendBase=120

SendBase=92

Transport Layer 3-37

TCP retransmission scenarios

X

cumulative ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=120 15 bytes of data

tim

eo

ut

Seq=100 20 bytes of data

ACK=120

Transport Layer 3-38

TCP ACK generation [RFC 1122 RFC

2581]

event at receiver

arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

arrival of in-order segment withexpected seq One other segment has ACK pending

arrival of out-of-order segmenthigher-than-expect seq Gap detected

arrival of segment that partially or completely fills gap

TCP receiver action

delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

immediately send single cumulative ACK ACKing both in-order segments

immediately send duplicate ACK indicating seq of next expected byte

immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-39

TCP fast retransmit time-out period

often relatively long long delay before

resending lost packet detect lost

segments via duplicate ACKs sender often sends

many segments back-to-back

if segment is lost there will likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unacked segment with smallest seq

likely that unacked segment lost so donrsquot wait for timeout

TCP fast retransmit

(ldquotriple duplicate ACKsrdquo)

Transport Layer 3-40

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

tim

eo

ut ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer 3-41

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-42

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

application may remove data from

TCP socket buffers hellip

hellip slower than TCP

receiver is delivering(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-43

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size can be

set via socket options most operating systems

auto-adjust RcvBuffer sender limits amount

of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value to ensure receive buffer will not overflow

receiver-side buffering

Transport Layer 3-44

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-45

Connection Managementbefore exchanging data senderreceiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-46

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-47

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-48

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-49

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-50

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-51

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

TCP Overall state machine

Transport Layer 3-52

Transport Layer 3-53

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-54

congestion informally ldquotoo many sources sending

too much data too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Principles of congestion control

Transport Layer 3-55

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

output link capacity R

no retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

Transport Layer 3-56

one router finite buffers sender retransmission of timed-out packet

app-layer input = app-layer outputin = out

transport-layer input includes retransmissions rsquoin

ge in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

Causescosts of congestion scenario 2

Transport Layer 3-57

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-58

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-59

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-60

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-61

R2

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-62

four senders multihop paths timeoutretransmit

Q what happens as in and rsquo

in increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host CHost D

in original data

in original data plus

retransmitted data

A as red rsquoin increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-63

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

bandwidth used for that packet wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 10: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-10

Connectionless demux example

DatagramSocket serverSocket = new DatagramSocket

(6428)

transport

application

physical

link

network

P3transport

application

physical

link

network

P1

transport

application

physical

link

network

P4

DatagramSocket mySocket1 = new DatagramSocket (5775)

DatagramSocket mySocket2 = new DatagramSocket (9157)

source port 9157dest port 6428

source port 6428dest port 9157

source port dest port

source port dest port

Transport Layer 3-11

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

demux receiver uses all four values to direct segment to right socket

server host has many simultaneous TCP sockets each socket identified

by its own 4-tuple web servers have

different socket each client non-persistent HTTP

will have different socket for each request

Transport Layer 3-12

Connection-oriented demux example

transport

application

physical

link

network

P3transport

app

physical

link

P4

transport

application

physical

link

network

P2

source IPport A9157dest IP port B80

source IPport B80dest IPport A9157

host IP address

A

host IP address

C

network

P6P5P3

source IPport C5775dest IPport B80

source IPport C9157dest IPport B80

three segments all destined to IP address B dest port 80 are demultiplexed to different sockets

server IP

address B

server socket also port 80

Transport Layer 3-13

Connection-oriented demux example

transport

application

physical

link

network

P3transport

app

physical

link

transport

application

physical

link

network

P2

source IPport A9157dest IP port B80

source IPport B80dest IPport A9157

host IP address

A

host IP address

C

server IP

address B

network

P3

source IPport C5775dest IPport B80

source IPport C9157dest IPport B80

P4

server socket also port 80threaded server

Transport Layer 3-14

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-15

UDP User Datagram Protocol [RFC 768] no frills bare bones

transport protocol for ldquobest effortrdquo service UDP segments may be lost delivered out-of-

order connectionless

no sender-receiver handshaking

each UDP segment handled independently

UDP uses streaming

multimedia apps (loss tolerant rate sensitive)

DNS SNMP

reliable transfer over UDP add reliability at

application layer application-specific

error recovery

Transport Layer 3-16

UDP segment header

source port dest port

32 bits

applicationdata

(payload)

UDP segment format

length checksum

length in bytes of UDP segment

including header

no connection establishment (which can add delay)

simple no connection state at sender receiver

small header size no congestion control

UDP can blast away as fast as desired

why is there a UDP

Transport Layer 3-17

UDP checksum

sender treat segment contents

including header fields as sequence of 16-bit integers

checksum addition (onersquos complement sum) of segment contents

sender puts checksum value into UDP checksum field

receiver compute checksum of

received segment check if computed

checksum equals checksum field value NO - error detected YES - no error detected

But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (flipped bits) in segments

Transport Layer 3-18

Internet checksum exampleexample add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

Note when adding numbers a carryout from the most significant bit needs to be added to the

result

Q1 Sockets and multiplexing TCP uses more information in packet

headers in order to demultiplex packets compared to UDPA TrueB False

Transport Layer 3-19

Q2 Sockets UDP Suppose we use UDP instead of TCP under

HTTP for designing a web server where all requests and responses fit in a single packet Suppose a 100 clients are simultaneously communicating with this web server How many sockets are respectively at the server and at each clientA 11B 21C 2002D 1001E 101 1

Transport Layer 3-20

Q3 Sockets TCP

Suppose a 100 clients are simultaneously communicating with (a traditional HTTPTCP) web server How many sockets are respectively at the server and at each clientA 11B 21C 2002D 1001E 101 1

Transport Layer 3-21

Q4 Sockets TCP

Suppose a 100 clients are simultaneously communicating with (a traditional HTTPTCP) web server Do all of the sockets at the server have the same server-side port numberA YesB No

Transport Layer 3-22

Q5 UDP checksums Letrsquos denote a UDP packet as (checksum

data) ignoring other fields for this question Suppose a sender sends (0010 1110) and the receiver receives (00111110) Which of the following is true of the receiverA Thinks the packet is corrupted and

discards the packetB Thinks only the checksum is corrupted

and delivers the correct data to the application

C Can possibly conclude that nothing is wrong with the packet

D A and CTransport Layer 3-23

Transport Layer 3-24

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-25

TCP Overview RFCs 79311221323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size connection-oriented

handshaking (exchange of control msgs) inits sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver reliable in-order

byte steam no ldquomessage

boundariesrdquo pipelined

TCP congestion and flow control set window size

Transport Layer 3-26

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement number

receive window

Urg data pointerchecksum

FSRPAUheadlen

notused

options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 3-27

TCP seq numbers ACKssequence numbers

byte stream ldquonumberrdquo of first byte in segmentrsquos data

acknowledgementsseq of next byte expected from other side

cumulative ACKQ how receiver handles out-of-order segmentsA TCP spec doesnrsquot say - up to implementor

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

incoming segment to sender

A

sent ACKed

sent not-yet ACKed(ldquoin-flightrdquo)

usablebut not yet sent

not usable

window size N

sender sequence number space

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

outgoing segment from sender

Transport Layer 3-28

TCP seq numbers ACKs

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

simple telnet scenario

Host BHost A

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Transport Layer 3-29

TCP round trip time timeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short

premature timeout unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT

SampleRTT measured time from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several

recent measurements not just current SampleRTT

Transport Layer 3-30

RTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

EstimatedRTT = (1- )EstimatedRTT + SampleRTT exponential weighted moving average influence of past sample decreases

exponentially fast typical value = 0125

TCP round trip time timeout

RTT

(mill

iseco

nds)

RTT gaiacsumassedu to fantasiaeurecomfr

sampleRTT

EstimatedRTT

time (seconds)

Transport Layer 3-31

timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin

estimate SampleRTT deviation from EstimatedRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

TCP round trip time timeout

(typically = 025)

TimeoutInterval = EstimatedRTT + 4DevRTT

estimated RTT ldquosafety marginrdquo

Transport Layer 3-32

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-33

TCP reliable data transfer TCP creates rdt

service on top of IPrsquos unreliable service pipelined segments cumulative acks

bull selective acks often supported as an option

single retransmission timer

retransmissions triggered by timeout events duplicate acks

letrsquos initially consider simplified TCP sender ignore duplicate acks ignore flow control

congestion control

Transport Layer 3-34

TCP sender eventsdata rcvd from app create segment

with seq (= byte-stream number of first data byte in segment)

start timer if not already running (for oldest unacked segment) TimeOutInterval =

smoothed_RTT + 4deviation_RTT

timeout retransmit segment

that caused timeout restart timer ack rcvd if ack acknowledges

previously unacked segments update what is

known to be ACKed (re-)start timer if still

unacked segments

Transport Layer 3-35

TCP sender (simplified)

waitfor

event

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

create segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running) start timer

data received from application above

retransmit not-yet-acked segment with smallest seq start timer

timeout

if (y gt SendBase) SendBase = y SendBasendash1 last cumulatively ACKed byte if (there are currently not-yet-acked segments) (re-)start timer else stop timer

ACK received with ACK field value y

Transport Layer 3-36

TCP retransmission scenarios

lost ACK scenario

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8 bytes of data

Xtim

eo

ut

ACK=100

premature timeout

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8bytes of data

tim

eo

ut

ACK=120

Seq=100 20 bytes of data

ACK=120

SendBase=100

SendBase=120

SendBase=120

SendBase=92

Transport Layer 3-37

TCP retransmission scenarios

X

cumulative ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=120 15 bytes of data

tim

eo

ut

Seq=100 20 bytes of data

ACK=120

Transport Layer 3-38

TCP ACK generation [RFC 1122 RFC

2581]

event at receiver

arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

arrival of in-order segment withexpected seq One other segment has ACK pending

arrival of out-of-order segmenthigher-than-expect seq Gap detected

arrival of segment that partially or completely fills gap

TCP receiver action

delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

immediately send single cumulative ACK ACKing both in-order segments

immediately send duplicate ACK indicating seq of next expected byte

immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-39

TCP fast retransmit time-out period

often relatively long long delay before

resending lost packet detect lost

segments via duplicate ACKs sender often sends

many segments back-to-back

if segment is lost there will likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unacked segment with smallest seq

likely that unacked segment lost so donrsquot wait for timeout

TCP fast retransmit

(ldquotriple duplicate ACKsrdquo)

Transport Layer 3-40

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

tim

eo

ut ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer 3-41

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-42

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

application may remove data from

TCP socket buffers hellip

hellip slower than TCP

receiver is delivering(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-43

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size can be

set via socket options most operating systems

auto-adjust RcvBuffer sender limits amount

of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value to ensure receive buffer will not overflow

receiver-side buffering

Transport Layer 3-44

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-45

Connection Managementbefore exchanging data senderreceiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-46

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-47

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-48

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-49

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-50

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-51

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

TCP Overall state machine

Transport Layer 3-52

Transport Layer 3-53

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-54

congestion informally ldquotoo many sources sending

too much data too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Principles of congestion control

Transport Layer 3-55

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

output link capacity R

no retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

Transport Layer 3-56

one router finite buffers sender retransmission of timed-out packet

app-layer input = app-layer outputin = out

transport-layer input includes retransmissions rsquoin

ge in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

Causescosts of congestion scenario 2

Transport Layer 3-57

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-58

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-59

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-60

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-61

R2

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-62

four senders multihop paths timeoutretransmit

Q what happens as in and rsquo

in increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host CHost D

in original data

in original data plus

retransmitted data

A as red rsquoin increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-63

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

bandwidth used for that packet wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 11: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-11

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

demux receiver uses all four values to direct segment to right socket

server host has many simultaneous TCP sockets each socket identified

by its own 4-tuple web servers have

different socket each client non-persistent HTTP

will have different socket for each request

Transport Layer 3-12

Connection-oriented demux example

transport

application

physical

link

network

P3transport

app

physical

link

P4

transport

application

physical

link

network

P2

source IPport A9157dest IP port B80

source IPport B80dest IPport A9157

host IP address

A

host IP address

C

network

P6P5P3

source IPport C5775dest IPport B80

source IPport C9157dest IPport B80

three segments all destined to IP address B dest port 80 are demultiplexed to different sockets

server IP

address B

server socket also port 80

Transport Layer 3-13

Connection-oriented demux example

transport

application

physical

link

network

P3transport

app

physical

link

transport

application

physical

link

network

P2

source IPport A9157dest IP port B80

source IPport B80dest IPport A9157

host IP address

A

host IP address

C

server IP

address B

network

P3

source IPport C5775dest IPport B80

source IPport C9157dest IPport B80

P4

server socket also port 80threaded server

Transport Layer 3-14

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-15

UDP User Datagram Protocol [RFC 768] no frills bare bones

transport protocol for ldquobest effortrdquo service UDP segments may be lost delivered out-of-

order connectionless

no sender-receiver handshaking

each UDP segment handled independently

UDP uses streaming

multimedia apps (loss tolerant rate sensitive)

DNS SNMP

reliable transfer over UDP add reliability at

application layer application-specific

error recovery

Transport Layer 3-16

UDP segment header

source port dest port

32 bits

applicationdata

(payload)

UDP segment format

length checksum

length in bytes of UDP segment

including header

no connection establishment (which can add delay)

simple no connection state at sender receiver

small header size no congestion control

UDP can blast away as fast as desired

why is there a UDP

Transport Layer 3-17

UDP checksum

sender treat segment contents

including header fields as sequence of 16-bit integers

checksum addition (onersquos complement sum) of segment contents

sender puts checksum value into UDP checksum field

receiver compute checksum of

received segment check if computed

checksum equals checksum field value NO - error detected YES - no error detected

But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (flipped bits) in segments

Transport Layer 3-18

Internet checksum exampleexample add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

Note when adding numbers a carryout from the most significant bit needs to be added to the

result

Q1 Sockets and multiplexing TCP uses more information in packet

headers in order to demultiplex packets compared to UDPA TrueB False

Transport Layer 3-19

Q2 Sockets UDP Suppose we use UDP instead of TCP under

HTTP for designing a web server where all requests and responses fit in a single packet Suppose a 100 clients are simultaneously communicating with this web server How many sockets are respectively at the server and at each clientA 11B 21C 2002D 1001E 101 1

Transport Layer 3-20

Q3 Sockets TCP

Suppose a 100 clients are simultaneously communicating with (a traditional HTTPTCP) web server How many sockets are respectively at the server and at each clientA 11B 21C 2002D 1001E 101 1

Transport Layer 3-21

Q4 Sockets TCP

Suppose a 100 clients are simultaneously communicating with (a traditional HTTPTCP) web server Do all of the sockets at the server have the same server-side port numberA YesB No

Transport Layer 3-22

Q5 UDP checksums Letrsquos denote a UDP packet as (checksum

data) ignoring other fields for this question Suppose a sender sends (0010 1110) and the receiver receives (00111110) Which of the following is true of the receiverA Thinks the packet is corrupted and

discards the packetB Thinks only the checksum is corrupted

and delivers the correct data to the application

C Can possibly conclude that nothing is wrong with the packet

D A and CTransport Layer 3-23

Transport Layer 3-24

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-25

TCP Overview RFCs 79311221323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size connection-oriented

handshaking (exchange of control msgs) inits sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver reliable in-order

byte steam no ldquomessage

boundariesrdquo pipelined

TCP congestion and flow control set window size

Transport Layer 3-26

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement number

receive window

Urg data pointerchecksum

FSRPAUheadlen

notused

options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 3-27

TCP seq numbers ACKssequence numbers

byte stream ldquonumberrdquo of first byte in segmentrsquos data

acknowledgementsseq of next byte expected from other side

cumulative ACKQ how receiver handles out-of-order segmentsA TCP spec doesnrsquot say - up to implementor

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

incoming segment to sender

A

sent ACKed

sent not-yet ACKed(ldquoin-flightrdquo)

usablebut not yet sent

not usable

window size N

sender sequence number space

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

outgoing segment from sender

Transport Layer 3-28

TCP seq numbers ACKs

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

simple telnet scenario

Host BHost A

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Transport Layer 3-29

TCP round trip time timeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short

premature timeout unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT

SampleRTT measured time from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several

recent measurements not just current SampleRTT

Transport Layer 3-30

RTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

EstimatedRTT = (1- )EstimatedRTT + SampleRTT exponential weighted moving average influence of past sample decreases

exponentially fast typical value = 0125

TCP round trip time timeout

RTT

(mill

iseco

nds)

RTT gaiacsumassedu to fantasiaeurecomfr

sampleRTT

EstimatedRTT

time (seconds)

Transport Layer 3-31

timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin

estimate SampleRTT deviation from EstimatedRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

TCP round trip time timeout

(typically = 025)

TimeoutInterval = EstimatedRTT + 4DevRTT

estimated RTT ldquosafety marginrdquo

Transport Layer 3-32

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-33

TCP reliable data transfer TCP creates rdt

service on top of IPrsquos unreliable service pipelined segments cumulative acks

bull selective acks often supported as an option

single retransmission timer

retransmissions triggered by timeout events duplicate acks

letrsquos initially consider simplified TCP sender ignore duplicate acks ignore flow control

congestion control

Transport Layer 3-34

TCP sender eventsdata rcvd from app create segment

with seq (= byte-stream number of first data byte in segment)

start timer if not already running (for oldest unacked segment) TimeOutInterval =

smoothed_RTT + 4deviation_RTT

timeout retransmit segment

that caused timeout restart timer ack rcvd if ack acknowledges

previously unacked segments update what is

known to be ACKed (re-)start timer if still

unacked segments

Transport Layer 3-35

TCP sender (simplified)

waitfor

event

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

create segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running) start timer

data received from application above

retransmit not-yet-acked segment with smallest seq start timer

timeout

if (y gt SendBase) SendBase = y SendBasendash1 last cumulatively ACKed byte if (there are currently not-yet-acked segments) (re-)start timer else stop timer

ACK received with ACK field value y

Transport Layer 3-36

TCP retransmission scenarios

lost ACK scenario

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8 bytes of data

Xtim

eo

ut

ACK=100

premature timeout

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8bytes of data

tim

eo

ut

ACK=120

Seq=100 20 bytes of data

ACK=120

SendBase=100

SendBase=120

SendBase=120

SendBase=92

Transport Layer 3-37

TCP retransmission scenarios

X

cumulative ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=120 15 bytes of data

tim

eo

ut

Seq=100 20 bytes of data

ACK=120

Transport Layer 3-38

TCP ACK generation [RFC 1122 RFC

2581]

event at receiver

arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

arrival of in-order segment withexpected seq One other segment has ACK pending

arrival of out-of-order segmenthigher-than-expect seq Gap detected

arrival of segment that partially or completely fills gap

TCP receiver action

delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

immediately send single cumulative ACK ACKing both in-order segments

immediately send duplicate ACK indicating seq of next expected byte

immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-39

TCP fast retransmit time-out period

often relatively long long delay before

resending lost packet detect lost

segments via duplicate ACKs sender often sends

many segments back-to-back

if segment is lost there will likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unacked segment with smallest seq

likely that unacked segment lost so donrsquot wait for timeout

TCP fast retransmit

(ldquotriple duplicate ACKsrdquo)

Transport Layer 3-40

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

tim

eo

ut ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer 3-41

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-42

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

application may remove data from

TCP socket buffers hellip

hellip slower than TCP

receiver is delivering(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-43

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size can be

set via socket options most operating systems

auto-adjust RcvBuffer sender limits amount

of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value to ensure receive buffer will not overflow

receiver-side buffering

Transport Layer 3-44

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-45

Connection Managementbefore exchanging data senderreceiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-46

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-47

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-48

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-49

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-50

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-51

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

TCP Overall state machine

Transport Layer 3-52

Transport Layer 3-53

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-54

congestion informally ldquotoo many sources sending

too much data too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Principles of congestion control

Transport Layer 3-55

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

output link capacity R

no retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

Transport Layer 3-56

one router finite buffers sender retransmission of timed-out packet

app-layer input = app-layer outputin = out

transport-layer input includes retransmissions rsquoin

ge in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

Causescosts of congestion scenario 2

Transport Layer 3-57

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-58

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-59

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-60

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-61

R2

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-62

four senders multihop paths timeoutretransmit

Q what happens as in and rsquo

in increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host CHost D

in original data

in original data plus

retransmitted data

A as red rsquoin increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-63

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

bandwidth used for that packet wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 12: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-12

Connection-oriented demux example

transport

application

physical

link

network

P3transport

app

physical

link

P4

transport

application

physical

link

network

P2

source IPport A9157dest IP port B80

source IPport B80dest IPport A9157

host IP address

A

host IP address

C

network

P6P5P3

source IPport C5775dest IPport B80

source IPport C9157dest IPport B80

three segments all destined to IP address B dest port 80 are demultiplexed to different sockets

server IP

address B

server socket also port 80

Transport Layer 3-13

Connection-oriented demux example

transport

application

physical

link

network

P3transport

app

physical

link

transport

application

physical

link

network

P2

source IPport A9157dest IP port B80

source IPport B80dest IPport A9157

host IP address

A

host IP address

C

server IP

address B

network

P3

source IPport C5775dest IPport B80

source IPport C9157dest IPport B80

P4

server socket also port 80threaded server

Transport Layer 3-14

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-15

UDP User Datagram Protocol [RFC 768] no frills bare bones

transport protocol for ldquobest effortrdquo service UDP segments may be lost delivered out-of-

order connectionless

no sender-receiver handshaking

each UDP segment handled independently

UDP uses streaming

multimedia apps (loss tolerant rate sensitive)

DNS SNMP

reliable transfer over UDP add reliability at

application layer application-specific

error recovery

Transport Layer 3-16

UDP segment header

source port dest port

32 bits

applicationdata

(payload)

UDP segment format

length checksum

length in bytes of UDP segment

including header

no connection establishment (which can add delay)

simple no connection state at sender receiver

small header size no congestion control

UDP can blast away as fast as desired

why is there a UDP

Transport Layer 3-17

UDP checksum

sender treat segment contents

including header fields as sequence of 16-bit integers

checksum addition (onersquos complement sum) of segment contents

sender puts checksum value into UDP checksum field

receiver compute checksum of

received segment check if computed

checksum equals checksum field value NO - error detected YES - no error detected

But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (flipped bits) in segments

Transport Layer 3-18

Internet checksum exampleexample add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

Note when adding numbers a carryout from the most significant bit needs to be added to the

result

Q1 Sockets and multiplexing TCP uses more information in packet

headers in order to demultiplex packets compared to UDPA TrueB False

Transport Layer 3-19

Q2 Sockets UDP Suppose we use UDP instead of TCP under

HTTP for designing a web server where all requests and responses fit in a single packet Suppose a 100 clients are simultaneously communicating with this web server How many sockets are respectively at the server and at each clientA 11B 21C 2002D 1001E 101 1

Transport Layer 3-20

Q3 Sockets TCP

Suppose a 100 clients are simultaneously communicating with (a traditional HTTPTCP) web server How many sockets are respectively at the server and at each clientA 11B 21C 2002D 1001E 101 1

Transport Layer 3-21

Q4 Sockets TCP

Suppose a 100 clients are simultaneously communicating with (a traditional HTTPTCP) web server Do all of the sockets at the server have the same server-side port numberA YesB No

Transport Layer 3-22

Q5 UDP checksums Letrsquos denote a UDP packet as (checksum

data) ignoring other fields for this question Suppose a sender sends (0010 1110) and the receiver receives (00111110) Which of the following is true of the receiverA Thinks the packet is corrupted and

discards the packetB Thinks only the checksum is corrupted

and delivers the correct data to the application

C Can possibly conclude that nothing is wrong with the packet

D A and CTransport Layer 3-23

Transport Layer 3-24

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-25

TCP Overview RFCs 79311221323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size connection-oriented

handshaking (exchange of control msgs) inits sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver reliable in-order

byte steam no ldquomessage

boundariesrdquo pipelined

TCP congestion and flow control set window size

Transport Layer 3-26

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement number

receive window

Urg data pointerchecksum

FSRPAUheadlen

notused

options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 3-27

TCP seq numbers ACKssequence numbers

byte stream ldquonumberrdquo of first byte in segmentrsquos data

acknowledgementsseq of next byte expected from other side

cumulative ACKQ how receiver handles out-of-order segmentsA TCP spec doesnrsquot say - up to implementor

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

incoming segment to sender

A

sent ACKed

sent not-yet ACKed(ldquoin-flightrdquo)

usablebut not yet sent

not usable

window size N

sender sequence number space

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

outgoing segment from sender

Transport Layer 3-28

TCP seq numbers ACKs

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

simple telnet scenario

Host BHost A

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Transport Layer 3-29

TCP round trip time timeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short

premature timeout unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT

SampleRTT measured time from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several

recent measurements not just current SampleRTT

Transport Layer 3-30

RTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

EstimatedRTT = (1- )EstimatedRTT + SampleRTT exponential weighted moving average influence of past sample decreases

exponentially fast typical value = 0125

TCP round trip time timeout

RTT

(mill

iseco

nds)

RTT gaiacsumassedu to fantasiaeurecomfr

sampleRTT

EstimatedRTT

time (seconds)

Transport Layer 3-31

timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin

estimate SampleRTT deviation from EstimatedRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

TCP round trip time timeout

(typically = 025)

TimeoutInterval = EstimatedRTT + 4DevRTT

estimated RTT ldquosafety marginrdquo

Transport Layer 3-32

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-33

TCP reliable data transfer TCP creates rdt

service on top of IPrsquos unreliable service pipelined segments cumulative acks

bull selective acks often supported as an option

single retransmission timer

retransmissions triggered by timeout events duplicate acks

letrsquos initially consider simplified TCP sender ignore duplicate acks ignore flow control

congestion control

Transport Layer 3-34

TCP sender eventsdata rcvd from app create segment

with seq (= byte-stream number of first data byte in segment)

start timer if not already running (for oldest unacked segment) TimeOutInterval =

smoothed_RTT + 4deviation_RTT

timeout retransmit segment

that caused timeout restart timer ack rcvd if ack acknowledges

previously unacked segments update what is

known to be ACKed (re-)start timer if still

unacked segments

Transport Layer 3-35

TCP sender (simplified)

waitfor

event

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

create segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running) start timer

data received from application above

retransmit not-yet-acked segment with smallest seq start timer

timeout

if (y gt SendBase) SendBase = y SendBasendash1 last cumulatively ACKed byte if (there are currently not-yet-acked segments) (re-)start timer else stop timer

ACK received with ACK field value y

Transport Layer 3-36

TCP retransmission scenarios

lost ACK scenario

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8 bytes of data

Xtim

eo

ut

ACK=100

premature timeout

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8bytes of data

tim

eo

ut

ACK=120

Seq=100 20 bytes of data

ACK=120

SendBase=100

SendBase=120

SendBase=120

SendBase=92

Transport Layer 3-37

TCP retransmission scenarios

X

cumulative ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=120 15 bytes of data

tim

eo

ut

Seq=100 20 bytes of data

ACK=120

Transport Layer 3-38

TCP ACK generation [RFC 1122 RFC

2581]

event at receiver

arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

arrival of in-order segment withexpected seq One other segment has ACK pending

arrival of out-of-order segmenthigher-than-expect seq Gap detected

arrival of segment that partially or completely fills gap

TCP receiver action

delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

immediately send single cumulative ACK ACKing both in-order segments

immediately send duplicate ACK indicating seq of next expected byte

immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-39

TCP fast retransmit time-out period

often relatively long long delay before

resending lost packet detect lost

segments via duplicate ACKs sender often sends

many segments back-to-back

if segment is lost there will likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unacked segment with smallest seq

likely that unacked segment lost so donrsquot wait for timeout

TCP fast retransmit

(ldquotriple duplicate ACKsrdquo)

Transport Layer 3-40

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

tim

eo

ut ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer 3-41

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-42

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

application may remove data from

TCP socket buffers hellip

hellip slower than TCP

receiver is delivering(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-43

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size can be

set via socket options most operating systems

auto-adjust RcvBuffer sender limits amount

of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value to ensure receive buffer will not overflow

receiver-side buffering

Transport Layer 3-44

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-45

Connection Managementbefore exchanging data senderreceiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-46

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-47

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-48

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-49

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-50

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-51

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

TCP Overall state machine

Transport Layer 3-52

Transport Layer 3-53

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-54

congestion informally ldquotoo many sources sending

too much data too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Principles of congestion control

Transport Layer 3-55

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

output link capacity R

no retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

Transport Layer 3-56

one router finite buffers sender retransmission of timed-out packet

app-layer input = app-layer outputin = out

transport-layer input includes retransmissions rsquoin

ge in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

Causescosts of congestion scenario 2

Transport Layer 3-57

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-58

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-59

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-60

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-61

R2

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-62

four senders multihop paths timeoutretransmit

Q what happens as in and rsquo

in increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host CHost D

in original data

in original data plus

retransmitted data

A as red rsquoin increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-63

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

bandwidth used for that packet wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 13: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-13

Connection-oriented demux example

transport

application

physical

link

network

P3transport

app

physical

link

transport

application

physical

link

network

P2

source IPport A9157dest IP port B80

source IPport B80dest IPport A9157

host IP address

A

host IP address

C

server IP

address B

network

P3

source IPport C5775dest IPport B80

source IPport C9157dest IPport B80

P4

server socket also port 80threaded server

Transport Layer 3-14

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-15

UDP User Datagram Protocol [RFC 768] no frills bare bones

transport protocol for ldquobest effortrdquo service UDP segments may be lost delivered out-of-

order connectionless

no sender-receiver handshaking

each UDP segment handled independently

UDP uses streaming

multimedia apps (loss tolerant rate sensitive)

DNS SNMP

reliable transfer over UDP add reliability at

application layer application-specific

error recovery

Transport Layer 3-16

UDP segment header

source port dest port

32 bits

applicationdata

(payload)

UDP segment format

length checksum

length in bytes of UDP segment

including header

no connection establishment (which can add delay)

simple no connection state at sender receiver

small header size no congestion control

UDP can blast away as fast as desired

why is there a UDP

Transport Layer 3-17

UDP checksum

sender treat segment contents

including header fields as sequence of 16-bit integers

checksum addition (onersquos complement sum) of segment contents

sender puts checksum value into UDP checksum field

receiver compute checksum of

received segment check if computed

checksum equals checksum field value NO - error detected YES - no error detected

But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (flipped bits) in segments

Transport Layer 3-18

Internet checksum exampleexample add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

Note when adding numbers a carryout from the most significant bit needs to be added to the

result

Q1 Sockets and multiplexing TCP uses more information in packet

headers in order to demultiplex packets compared to UDPA TrueB False

Transport Layer 3-19

Q2 Sockets UDP Suppose we use UDP instead of TCP under

HTTP for designing a web server where all requests and responses fit in a single packet Suppose a 100 clients are simultaneously communicating with this web server How many sockets are respectively at the server and at each clientA 11B 21C 2002D 1001E 101 1

Transport Layer 3-20

Q3 Sockets TCP

Suppose a 100 clients are simultaneously communicating with (a traditional HTTPTCP) web server How many sockets are respectively at the server and at each clientA 11B 21C 2002D 1001E 101 1

Transport Layer 3-21

Q4 Sockets TCP

Suppose a 100 clients are simultaneously communicating with (a traditional HTTPTCP) web server Do all of the sockets at the server have the same server-side port numberA YesB No

Transport Layer 3-22

Q5 UDP checksums Letrsquos denote a UDP packet as (checksum

data) ignoring other fields for this question Suppose a sender sends (0010 1110) and the receiver receives (00111110) Which of the following is true of the receiverA Thinks the packet is corrupted and

discards the packetB Thinks only the checksum is corrupted

and delivers the correct data to the application

C Can possibly conclude that nothing is wrong with the packet

D A and CTransport Layer 3-23

Transport Layer 3-24

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-25

TCP Overview RFCs 79311221323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size connection-oriented

handshaking (exchange of control msgs) inits sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver reliable in-order

byte steam no ldquomessage

boundariesrdquo pipelined

TCP congestion and flow control set window size

Transport Layer 3-26

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement number

receive window

Urg data pointerchecksum

FSRPAUheadlen

notused

options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 3-27

TCP seq numbers ACKssequence numbers

byte stream ldquonumberrdquo of first byte in segmentrsquos data

acknowledgementsseq of next byte expected from other side

cumulative ACKQ how receiver handles out-of-order segmentsA TCP spec doesnrsquot say - up to implementor

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

incoming segment to sender

A

sent ACKed

sent not-yet ACKed(ldquoin-flightrdquo)

usablebut not yet sent

not usable

window size N

sender sequence number space

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

outgoing segment from sender

Transport Layer 3-28

TCP seq numbers ACKs

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

simple telnet scenario

Host BHost A

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Transport Layer 3-29

TCP round trip time timeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short

premature timeout unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT

SampleRTT measured time from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several

recent measurements not just current SampleRTT

Transport Layer 3-30

RTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

EstimatedRTT = (1- )EstimatedRTT + SampleRTT exponential weighted moving average influence of past sample decreases

exponentially fast typical value = 0125

TCP round trip time timeout

RTT

(mill

iseco

nds)

RTT gaiacsumassedu to fantasiaeurecomfr

sampleRTT

EstimatedRTT

time (seconds)

Transport Layer 3-31

timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin

estimate SampleRTT deviation from EstimatedRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

TCP round trip time timeout

(typically = 025)

TimeoutInterval = EstimatedRTT + 4DevRTT

estimated RTT ldquosafety marginrdquo

Transport Layer 3-32

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-33

TCP reliable data transfer TCP creates rdt

service on top of IPrsquos unreliable service pipelined segments cumulative acks

bull selective acks often supported as an option

single retransmission timer

retransmissions triggered by timeout events duplicate acks

letrsquos initially consider simplified TCP sender ignore duplicate acks ignore flow control

congestion control

Transport Layer 3-34

TCP sender eventsdata rcvd from app create segment

with seq (= byte-stream number of first data byte in segment)

start timer if not already running (for oldest unacked segment) TimeOutInterval =

smoothed_RTT + 4deviation_RTT

timeout retransmit segment

that caused timeout restart timer ack rcvd if ack acknowledges

previously unacked segments update what is

known to be ACKed (re-)start timer if still

unacked segments

Transport Layer 3-35

TCP sender (simplified)

waitfor

event

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

create segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running) start timer

data received from application above

retransmit not-yet-acked segment with smallest seq start timer

timeout

if (y gt SendBase) SendBase = y SendBasendash1 last cumulatively ACKed byte if (there are currently not-yet-acked segments) (re-)start timer else stop timer

ACK received with ACK field value y

Transport Layer 3-36

TCP retransmission scenarios

lost ACK scenario

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8 bytes of data

Xtim

eo

ut

ACK=100

premature timeout

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8bytes of data

tim

eo

ut

ACK=120

Seq=100 20 bytes of data

ACK=120

SendBase=100

SendBase=120

SendBase=120

SendBase=92

Transport Layer 3-37

TCP retransmission scenarios

X

cumulative ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=120 15 bytes of data

tim

eo

ut

Seq=100 20 bytes of data

ACK=120

Transport Layer 3-38

TCP ACK generation [RFC 1122 RFC

2581]

event at receiver

arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

arrival of in-order segment withexpected seq One other segment has ACK pending

arrival of out-of-order segmenthigher-than-expect seq Gap detected

arrival of segment that partially or completely fills gap

TCP receiver action

delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

immediately send single cumulative ACK ACKing both in-order segments

immediately send duplicate ACK indicating seq of next expected byte

immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-39

TCP fast retransmit time-out period

often relatively long long delay before

resending lost packet detect lost

segments via duplicate ACKs sender often sends

many segments back-to-back

if segment is lost there will likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unacked segment with smallest seq

likely that unacked segment lost so donrsquot wait for timeout

TCP fast retransmit

(ldquotriple duplicate ACKsrdquo)

Transport Layer 3-40

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

tim

eo

ut ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer 3-41

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-42

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

application may remove data from

TCP socket buffers hellip

hellip slower than TCP

receiver is delivering(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-43

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size can be

set via socket options most operating systems

auto-adjust RcvBuffer sender limits amount

of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value to ensure receive buffer will not overflow

receiver-side buffering

Transport Layer 3-44

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-45

Connection Managementbefore exchanging data senderreceiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-46

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-47

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-48

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-49

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-50

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-51

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

TCP Overall state machine

Transport Layer 3-52

Transport Layer 3-53

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-54

congestion informally ldquotoo many sources sending

too much data too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Principles of congestion control

Transport Layer 3-55

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

output link capacity R

no retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

Transport Layer 3-56

one router finite buffers sender retransmission of timed-out packet

app-layer input = app-layer outputin = out

transport-layer input includes retransmissions rsquoin

ge in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

Causescosts of congestion scenario 2

Transport Layer 3-57

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-58

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-59

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-60

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-61

R2

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-62

four senders multihop paths timeoutretransmit

Q what happens as in and rsquo

in increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host CHost D

in original data

in original data plus

retransmitted data

A as red rsquoin increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-63

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

bandwidth used for that packet wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 14: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-14

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-15

UDP User Datagram Protocol [RFC 768] no frills bare bones

transport protocol for ldquobest effortrdquo service UDP segments may be lost delivered out-of-

order connectionless

no sender-receiver handshaking

each UDP segment handled independently

UDP uses streaming

multimedia apps (loss tolerant rate sensitive)

DNS SNMP

reliable transfer over UDP add reliability at

application layer application-specific

error recovery

Transport Layer 3-16

UDP segment header

source port dest port

32 bits

applicationdata

(payload)

UDP segment format

length checksum

length in bytes of UDP segment

including header

no connection establishment (which can add delay)

simple no connection state at sender receiver

small header size no congestion control

UDP can blast away as fast as desired

why is there a UDP

Transport Layer 3-17

UDP checksum

sender treat segment contents

including header fields as sequence of 16-bit integers

checksum addition (onersquos complement sum) of segment contents

sender puts checksum value into UDP checksum field

receiver compute checksum of

received segment check if computed

checksum equals checksum field value NO - error detected YES - no error detected

But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (flipped bits) in segments

Transport Layer 3-18

Internet checksum exampleexample add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

Note when adding numbers a carryout from the most significant bit needs to be added to the

result

Q1 Sockets and multiplexing TCP uses more information in packet

headers in order to demultiplex packets compared to UDPA TrueB False

Transport Layer 3-19

Q2 Sockets UDP Suppose we use UDP instead of TCP under

HTTP for designing a web server where all requests and responses fit in a single packet Suppose a 100 clients are simultaneously communicating with this web server How many sockets are respectively at the server and at each clientA 11B 21C 2002D 1001E 101 1

Transport Layer 3-20

Q3 Sockets TCP

Suppose a 100 clients are simultaneously communicating with (a traditional HTTPTCP) web server How many sockets are respectively at the server and at each clientA 11B 21C 2002D 1001E 101 1

Transport Layer 3-21

Q4 Sockets TCP

Suppose a 100 clients are simultaneously communicating with (a traditional HTTPTCP) web server Do all of the sockets at the server have the same server-side port numberA YesB No

Transport Layer 3-22

Q5 UDP checksums Letrsquos denote a UDP packet as (checksum

data) ignoring other fields for this question Suppose a sender sends (0010 1110) and the receiver receives (00111110) Which of the following is true of the receiverA Thinks the packet is corrupted and

discards the packetB Thinks only the checksum is corrupted

and delivers the correct data to the application

C Can possibly conclude that nothing is wrong with the packet

D A and CTransport Layer 3-23

Transport Layer 3-24

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-25

TCP Overview RFCs 79311221323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size connection-oriented

handshaking (exchange of control msgs) inits sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver reliable in-order

byte steam no ldquomessage

boundariesrdquo pipelined

TCP congestion and flow control set window size

Transport Layer 3-26

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement number

receive window

Urg data pointerchecksum

FSRPAUheadlen

notused

options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 3-27

TCP seq numbers ACKssequence numbers

byte stream ldquonumberrdquo of first byte in segmentrsquos data

acknowledgementsseq of next byte expected from other side

cumulative ACKQ how receiver handles out-of-order segmentsA TCP spec doesnrsquot say - up to implementor

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

incoming segment to sender

A

sent ACKed

sent not-yet ACKed(ldquoin-flightrdquo)

usablebut not yet sent

not usable

window size N

sender sequence number space

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

outgoing segment from sender

Transport Layer 3-28

TCP seq numbers ACKs

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

simple telnet scenario

Host BHost A

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Transport Layer 3-29

TCP round trip time timeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short

premature timeout unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT

SampleRTT measured time from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several

recent measurements not just current SampleRTT

Transport Layer 3-30

RTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

EstimatedRTT = (1- )EstimatedRTT + SampleRTT exponential weighted moving average influence of past sample decreases

exponentially fast typical value = 0125

TCP round trip time timeout

RTT

(mill

iseco

nds)

RTT gaiacsumassedu to fantasiaeurecomfr

sampleRTT

EstimatedRTT

time (seconds)

Transport Layer 3-31

timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin

estimate SampleRTT deviation from EstimatedRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

TCP round trip time timeout

(typically = 025)

TimeoutInterval = EstimatedRTT + 4DevRTT

estimated RTT ldquosafety marginrdquo

Transport Layer 3-32

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-33

TCP reliable data transfer TCP creates rdt

service on top of IPrsquos unreliable service pipelined segments cumulative acks

bull selective acks often supported as an option

single retransmission timer

retransmissions triggered by timeout events duplicate acks

letrsquos initially consider simplified TCP sender ignore duplicate acks ignore flow control

congestion control

Transport Layer 3-34

TCP sender eventsdata rcvd from app create segment

with seq (= byte-stream number of first data byte in segment)

start timer if not already running (for oldest unacked segment) TimeOutInterval =

smoothed_RTT + 4deviation_RTT

timeout retransmit segment

that caused timeout restart timer ack rcvd if ack acknowledges

previously unacked segments update what is

known to be ACKed (re-)start timer if still

unacked segments

Transport Layer 3-35

TCP sender (simplified)

waitfor

event

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

create segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running) start timer

data received from application above

retransmit not-yet-acked segment with smallest seq start timer

timeout

if (y gt SendBase) SendBase = y SendBasendash1 last cumulatively ACKed byte if (there are currently not-yet-acked segments) (re-)start timer else stop timer

ACK received with ACK field value y

Transport Layer 3-36

TCP retransmission scenarios

lost ACK scenario

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8 bytes of data

Xtim

eo

ut

ACK=100

premature timeout

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8bytes of data

tim

eo

ut

ACK=120

Seq=100 20 bytes of data

ACK=120

SendBase=100

SendBase=120

SendBase=120

SendBase=92

Transport Layer 3-37

TCP retransmission scenarios

X

cumulative ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=120 15 bytes of data

tim

eo

ut

Seq=100 20 bytes of data

ACK=120

Transport Layer 3-38

TCP ACK generation [RFC 1122 RFC

2581]

event at receiver

arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

arrival of in-order segment withexpected seq One other segment has ACK pending

arrival of out-of-order segmenthigher-than-expect seq Gap detected

arrival of segment that partially or completely fills gap

TCP receiver action

delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

immediately send single cumulative ACK ACKing both in-order segments

immediately send duplicate ACK indicating seq of next expected byte

immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-39

TCP fast retransmit time-out period

often relatively long long delay before

resending lost packet detect lost

segments via duplicate ACKs sender often sends

many segments back-to-back

if segment is lost there will likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unacked segment with smallest seq

likely that unacked segment lost so donrsquot wait for timeout

TCP fast retransmit

(ldquotriple duplicate ACKsrdquo)

Transport Layer 3-40

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

tim

eo

ut ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer 3-41

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-42

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

application may remove data from

TCP socket buffers hellip

hellip slower than TCP

receiver is delivering(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-43

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size can be

set via socket options most operating systems

auto-adjust RcvBuffer sender limits amount

of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value to ensure receive buffer will not overflow

receiver-side buffering

Transport Layer 3-44

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-45

Connection Managementbefore exchanging data senderreceiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-46

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-47

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-48

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-49

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-50

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-51

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

TCP Overall state machine

Transport Layer 3-52

Transport Layer 3-53

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-54

congestion informally ldquotoo many sources sending

too much data too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Principles of congestion control

Transport Layer 3-55

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

output link capacity R

no retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

Transport Layer 3-56

one router finite buffers sender retransmission of timed-out packet

app-layer input = app-layer outputin = out

transport-layer input includes retransmissions rsquoin

ge in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

Causescosts of congestion scenario 2

Transport Layer 3-57

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-58

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-59

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-60

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-61

R2

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-62

four senders multihop paths timeoutretransmit

Q what happens as in and rsquo

in increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host CHost D

in original data

in original data plus

retransmitted data

A as red rsquoin increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-63

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

bandwidth used for that packet wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 15: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-15

UDP User Datagram Protocol [RFC 768] no frills bare bones

transport protocol for ldquobest effortrdquo service UDP segments may be lost delivered out-of-

order connectionless

no sender-receiver handshaking

each UDP segment handled independently

UDP uses streaming

multimedia apps (loss tolerant rate sensitive)

DNS SNMP

reliable transfer over UDP add reliability at

application layer application-specific

error recovery

Transport Layer 3-16

UDP segment header

source port dest port

32 bits

applicationdata

(payload)

UDP segment format

length checksum

length in bytes of UDP segment

including header

no connection establishment (which can add delay)

simple no connection state at sender receiver

small header size no congestion control

UDP can blast away as fast as desired

why is there a UDP

Transport Layer 3-17

UDP checksum

sender treat segment contents

including header fields as sequence of 16-bit integers

checksum addition (onersquos complement sum) of segment contents

sender puts checksum value into UDP checksum field

receiver compute checksum of

received segment check if computed

checksum equals checksum field value NO - error detected YES - no error detected

But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (flipped bits) in segments

Transport Layer 3-18

Internet checksum exampleexample add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

Note when adding numbers a carryout from the most significant bit needs to be added to the

result

Q1 Sockets and multiplexing TCP uses more information in packet

headers in order to demultiplex packets compared to UDPA TrueB False

Transport Layer 3-19

Q2 Sockets UDP Suppose we use UDP instead of TCP under

HTTP for designing a web server where all requests and responses fit in a single packet Suppose a 100 clients are simultaneously communicating with this web server How many sockets are respectively at the server and at each clientA 11B 21C 2002D 1001E 101 1

Transport Layer 3-20

Q3 Sockets TCP

Suppose a 100 clients are simultaneously communicating with (a traditional HTTPTCP) web server How many sockets are respectively at the server and at each clientA 11B 21C 2002D 1001E 101 1

Transport Layer 3-21

Q4 Sockets TCP

Suppose a 100 clients are simultaneously communicating with (a traditional HTTPTCP) web server Do all of the sockets at the server have the same server-side port numberA YesB No

Transport Layer 3-22

Q5 UDP checksums Letrsquos denote a UDP packet as (checksum

data) ignoring other fields for this question Suppose a sender sends (0010 1110) and the receiver receives (00111110) Which of the following is true of the receiverA Thinks the packet is corrupted and

discards the packetB Thinks only the checksum is corrupted

and delivers the correct data to the application

C Can possibly conclude that nothing is wrong with the packet

D A and CTransport Layer 3-23

Transport Layer 3-24

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-25

TCP Overview RFCs 79311221323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size connection-oriented

handshaking (exchange of control msgs) inits sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver reliable in-order

byte steam no ldquomessage

boundariesrdquo pipelined

TCP congestion and flow control set window size

Transport Layer 3-26

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement number

receive window

Urg data pointerchecksum

FSRPAUheadlen

notused

options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 3-27

TCP seq numbers ACKssequence numbers

byte stream ldquonumberrdquo of first byte in segmentrsquos data

acknowledgementsseq of next byte expected from other side

cumulative ACKQ how receiver handles out-of-order segmentsA TCP spec doesnrsquot say - up to implementor

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

incoming segment to sender

A

sent ACKed

sent not-yet ACKed(ldquoin-flightrdquo)

usablebut not yet sent

not usable

window size N

sender sequence number space

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

outgoing segment from sender

Transport Layer 3-28

TCP seq numbers ACKs

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

simple telnet scenario

Host BHost A

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Transport Layer 3-29

TCP round trip time timeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short

premature timeout unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT

SampleRTT measured time from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several

recent measurements not just current SampleRTT

Transport Layer 3-30

RTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

EstimatedRTT = (1- )EstimatedRTT + SampleRTT exponential weighted moving average influence of past sample decreases

exponentially fast typical value = 0125

TCP round trip time timeout

RTT

(mill

iseco

nds)

RTT gaiacsumassedu to fantasiaeurecomfr

sampleRTT

EstimatedRTT

time (seconds)

Transport Layer 3-31

timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin

estimate SampleRTT deviation from EstimatedRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

TCP round trip time timeout

(typically = 025)

TimeoutInterval = EstimatedRTT + 4DevRTT

estimated RTT ldquosafety marginrdquo

Transport Layer 3-32

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-33

TCP reliable data transfer TCP creates rdt

service on top of IPrsquos unreliable service pipelined segments cumulative acks

bull selective acks often supported as an option

single retransmission timer

retransmissions triggered by timeout events duplicate acks

letrsquos initially consider simplified TCP sender ignore duplicate acks ignore flow control

congestion control

Transport Layer 3-34

TCP sender eventsdata rcvd from app create segment

with seq (= byte-stream number of first data byte in segment)

start timer if not already running (for oldest unacked segment) TimeOutInterval =

smoothed_RTT + 4deviation_RTT

timeout retransmit segment

that caused timeout restart timer ack rcvd if ack acknowledges

previously unacked segments update what is

known to be ACKed (re-)start timer if still

unacked segments

Transport Layer 3-35

TCP sender (simplified)

waitfor

event

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

create segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running) start timer

data received from application above

retransmit not-yet-acked segment with smallest seq start timer

timeout

if (y gt SendBase) SendBase = y SendBasendash1 last cumulatively ACKed byte if (there are currently not-yet-acked segments) (re-)start timer else stop timer

ACK received with ACK field value y

Transport Layer 3-36

TCP retransmission scenarios

lost ACK scenario

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8 bytes of data

Xtim

eo

ut

ACK=100

premature timeout

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8bytes of data

tim

eo

ut

ACK=120

Seq=100 20 bytes of data

ACK=120

SendBase=100

SendBase=120

SendBase=120

SendBase=92

Transport Layer 3-37

TCP retransmission scenarios

X

cumulative ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=120 15 bytes of data

tim

eo

ut

Seq=100 20 bytes of data

ACK=120

Transport Layer 3-38

TCP ACK generation [RFC 1122 RFC

2581]

event at receiver

arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

arrival of in-order segment withexpected seq One other segment has ACK pending

arrival of out-of-order segmenthigher-than-expect seq Gap detected

arrival of segment that partially or completely fills gap

TCP receiver action

delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

immediately send single cumulative ACK ACKing both in-order segments

immediately send duplicate ACK indicating seq of next expected byte

immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-39

TCP fast retransmit time-out period

often relatively long long delay before

resending lost packet detect lost

segments via duplicate ACKs sender often sends

many segments back-to-back

if segment is lost there will likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unacked segment with smallest seq

likely that unacked segment lost so donrsquot wait for timeout

TCP fast retransmit

(ldquotriple duplicate ACKsrdquo)

Transport Layer 3-40

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

tim

eo

ut ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer 3-41

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-42

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

application may remove data from

TCP socket buffers hellip

hellip slower than TCP

receiver is delivering(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-43

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size can be

set via socket options most operating systems

auto-adjust RcvBuffer sender limits amount

of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value to ensure receive buffer will not overflow

receiver-side buffering

Transport Layer 3-44

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-45

Connection Managementbefore exchanging data senderreceiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-46

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-47

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-48

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-49

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-50

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-51

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

TCP Overall state machine

Transport Layer 3-52

Transport Layer 3-53

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-54

congestion informally ldquotoo many sources sending

too much data too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Principles of congestion control

Transport Layer 3-55

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

output link capacity R

no retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

Transport Layer 3-56

one router finite buffers sender retransmission of timed-out packet

app-layer input = app-layer outputin = out

transport-layer input includes retransmissions rsquoin

ge in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

Causescosts of congestion scenario 2

Transport Layer 3-57

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-58

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-59

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-60

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-61

R2

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-62

four senders multihop paths timeoutretransmit

Q what happens as in and rsquo

in increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host CHost D

in original data

in original data plus

retransmitted data

A as red rsquoin increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-63

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

bandwidth used for that packet wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 16: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-16

UDP segment header

source port dest port

32 bits

applicationdata

(payload)

UDP segment format

length checksum

length in bytes of UDP segment

including header

no connection establishment (which can add delay)

simple no connection state at sender receiver

small header size no congestion control

UDP can blast away as fast as desired

why is there a UDP

Transport Layer 3-17

UDP checksum

sender treat segment contents

including header fields as sequence of 16-bit integers

checksum addition (onersquos complement sum) of segment contents

sender puts checksum value into UDP checksum field

receiver compute checksum of

received segment check if computed

checksum equals checksum field value NO - error detected YES - no error detected

But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (flipped bits) in segments

Transport Layer 3-18

Internet checksum exampleexample add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

Note when adding numbers a carryout from the most significant bit needs to be added to the

result

Q1 Sockets and multiplexing TCP uses more information in packet

headers in order to demultiplex packets compared to UDPA TrueB False

Transport Layer 3-19

Q2 Sockets UDP Suppose we use UDP instead of TCP under

HTTP for designing a web server where all requests and responses fit in a single packet Suppose a 100 clients are simultaneously communicating with this web server How many sockets are respectively at the server and at each clientA 11B 21C 2002D 1001E 101 1

Transport Layer 3-20

Q3 Sockets TCP

Suppose a 100 clients are simultaneously communicating with (a traditional HTTPTCP) web server How many sockets are respectively at the server and at each clientA 11B 21C 2002D 1001E 101 1

Transport Layer 3-21

Q4 Sockets TCP

Suppose a 100 clients are simultaneously communicating with (a traditional HTTPTCP) web server Do all of the sockets at the server have the same server-side port numberA YesB No

Transport Layer 3-22

Q5 UDP checksums Letrsquos denote a UDP packet as (checksum

data) ignoring other fields for this question Suppose a sender sends (0010 1110) and the receiver receives (00111110) Which of the following is true of the receiverA Thinks the packet is corrupted and

discards the packetB Thinks only the checksum is corrupted

and delivers the correct data to the application

C Can possibly conclude that nothing is wrong with the packet

D A and CTransport Layer 3-23

Transport Layer 3-24

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-25

TCP Overview RFCs 79311221323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size connection-oriented

handshaking (exchange of control msgs) inits sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver reliable in-order

byte steam no ldquomessage

boundariesrdquo pipelined

TCP congestion and flow control set window size

Transport Layer 3-26

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement number

receive window

Urg data pointerchecksum

FSRPAUheadlen

notused

options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 3-27

TCP seq numbers ACKssequence numbers

byte stream ldquonumberrdquo of first byte in segmentrsquos data

acknowledgementsseq of next byte expected from other side

cumulative ACKQ how receiver handles out-of-order segmentsA TCP spec doesnrsquot say - up to implementor

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

incoming segment to sender

A

sent ACKed

sent not-yet ACKed(ldquoin-flightrdquo)

usablebut not yet sent

not usable

window size N

sender sequence number space

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

outgoing segment from sender

Transport Layer 3-28

TCP seq numbers ACKs

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

simple telnet scenario

Host BHost A

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Transport Layer 3-29

TCP round trip time timeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short

premature timeout unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT

SampleRTT measured time from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several

recent measurements not just current SampleRTT

Transport Layer 3-30

RTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

EstimatedRTT = (1- )EstimatedRTT + SampleRTT exponential weighted moving average influence of past sample decreases

exponentially fast typical value = 0125

TCP round trip time timeout

RTT

(mill

iseco

nds)

RTT gaiacsumassedu to fantasiaeurecomfr

sampleRTT

EstimatedRTT

time (seconds)

Transport Layer 3-31

timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin

estimate SampleRTT deviation from EstimatedRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

TCP round trip time timeout

(typically = 025)

TimeoutInterval = EstimatedRTT + 4DevRTT

estimated RTT ldquosafety marginrdquo

Transport Layer 3-32

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-33

TCP reliable data transfer TCP creates rdt

service on top of IPrsquos unreliable service pipelined segments cumulative acks

bull selective acks often supported as an option

single retransmission timer

retransmissions triggered by timeout events duplicate acks

letrsquos initially consider simplified TCP sender ignore duplicate acks ignore flow control

congestion control

Transport Layer 3-34

TCP sender eventsdata rcvd from app create segment

with seq (= byte-stream number of first data byte in segment)

start timer if not already running (for oldest unacked segment) TimeOutInterval =

smoothed_RTT + 4deviation_RTT

timeout retransmit segment

that caused timeout restart timer ack rcvd if ack acknowledges

previously unacked segments update what is

known to be ACKed (re-)start timer if still

unacked segments

Transport Layer 3-35

TCP sender (simplified)

waitfor

event

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

create segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running) start timer

data received from application above

retransmit not-yet-acked segment with smallest seq start timer

timeout

if (y gt SendBase) SendBase = y SendBasendash1 last cumulatively ACKed byte if (there are currently not-yet-acked segments) (re-)start timer else stop timer

ACK received with ACK field value y

Transport Layer 3-36

TCP retransmission scenarios

lost ACK scenario

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8 bytes of data

Xtim

eo

ut

ACK=100

premature timeout

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8bytes of data

tim

eo

ut

ACK=120

Seq=100 20 bytes of data

ACK=120

SendBase=100

SendBase=120

SendBase=120

SendBase=92

Transport Layer 3-37

TCP retransmission scenarios

X

cumulative ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=120 15 bytes of data

tim

eo

ut

Seq=100 20 bytes of data

ACK=120

Transport Layer 3-38

TCP ACK generation [RFC 1122 RFC

2581]

event at receiver

arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

arrival of in-order segment withexpected seq One other segment has ACK pending

arrival of out-of-order segmenthigher-than-expect seq Gap detected

arrival of segment that partially or completely fills gap

TCP receiver action

delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

immediately send single cumulative ACK ACKing both in-order segments

immediately send duplicate ACK indicating seq of next expected byte

immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-39

TCP fast retransmit time-out period

often relatively long long delay before

resending lost packet detect lost

segments via duplicate ACKs sender often sends

many segments back-to-back

if segment is lost there will likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unacked segment with smallest seq

likely that unacked segment lost so donrsquot wait for timeout

TCP fast retransmit

(ldquotriple duplicate ACKsrdquo)

Transport Layer 3-40

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

tim

eo

ut ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer 3-41

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-42

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

application may remove data from

TCP socket buffers hellip

hellip slower than TCP

receiver is delivering(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-43

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size can be

set via socket options most operating systems

auto-adjust RcvBuffer sender limits amount

of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value to ensure receive buffer will not overflow

receiver-side buffering

Transport Layer 3-44

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-45

Connection Managementbefore exchanging data senderreceiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-46

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-47

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-48

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-49

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-50

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-51

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

TCP Overall state machine

Transport Layer 3-52

Transport Layer 3-53

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-54

congestion informally ldquotoo many sources sending

too much data too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Principles of congestion control

Transport Layer 3-55

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

output link capacity R

no retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

Transport Layer 3-56

one router finite buffers sender retransmission of timed-out packet

app-layer input = app-layer outputin = out

transport-layer input includes retransmissions rsquoin

ge in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

Causescosts of congestion scenario 2

Transport Layer 3-57

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-58

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-59

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-60

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-61

R2

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-62

four senders multihop paths timeoutretransmit

Q what happens as in and rsquo

in increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host CHost D

in original data

in original data plus

retransmitted data

A as red rsquoin increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-63

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

bandwidth used for that packet wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 17: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-17

UDP checksum

sender treat segment contents

including header fields as sequence of 16-bit integers

checksum addition (onersquos complement sum) of segment contents

sender puts checksum value into UDP checksum field

receiver compute checksum of

received segment check if computed

checksum equals checksum field value NO - error detected YES - no error detected

But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (flipped bits) in segments

Transport Layer 3-18

Internet checksum exampleexample add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

Note when adding numbers a carryout from the most significant bit needs to be added to the

result

Q1 Sockets and multiplexing TCP uses more information in packet

headers in order to demultiplex packets compared to UDPA TrueB False

Transport Layer 3-19

Q2 Sockets UDP Suppose we use UDP instead of TCP under

HTTP for designing a web server where all requests and responses fit in a single packet Suppose a 100 clients are simultaneously communicating with this web server How many sockets are respectively at the server and at each clientA 11B 21C 2002D 1001E 101 1

Transport Layer 3-20

Q3 Sockets TCP

Suppose a 100 clients are simultaneously communicating with (a traditional HTTPTCP) web server How many sockets are respectively at the server and at each clientA 11B 21C 2002D 1001E 101 1

Transport Layer 3-21

Q4 Sockets TCP

Suppose a 100 clients are simultaneously communicating with (a traditional HTTPTCP) web server Do all of the sockets at the server have the same server-side port numberA YesB No

Transport Layer 3-22

Q5 UDP checksums Letrsquos denote a UDP packet as (checksum

data) ignoring other fields for this question Suppose a sender sends (0010 1110) and the receiver receives (00111110) Which of the following is true of the receiverA Thinks the packet is corrupted and

discards the packetB Thinks only the checksum is corrupted

and delivers the correct data to the application

C Can possibly conclude that nothing is wrong with the packet

D A and CTransport Layer 3-23

Transport Layer 3-24

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-25

TCP Overview RFCs 79311221323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size connection-oriented

handshaking (exchange of control msgs) inits sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver reliable in-order

byte steam no ldquomessage

boundariesrdquo pipelined

TCP congestion and flow control set window size

Transport Layer 3-26

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement number

receive window

Urg data pointerchecksum

FSRPAUheadlen

notused

options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 3-27

TCP seq numbers ACKssequence numbers

byte stream ldquonumberrdquo of first byte in segmentrsquos data

acknowledgementsseq of next byte expected from other side

cumulative ACKQ how receiver handles out-of-order segmentsA TCP spec doesnrsquot say - up to implementor

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

incoming segment to sender

A

sent ACKed

sent not-yet ACKed(ldquoin-flightrdquo)

usablebut not yet sent

not usable

window size N

sender sequence number space

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

outgoing segment from sender

Transport Layer 3-28

TCP seq numbers ACKs

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

simple telnet scenario

Host BHost A

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Transport Layer 3-29

TCP round trip time timeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short

premature timeout unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT

SampleRTT measured time from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several

recent measurements not just current SampleRTT

Transport Layer 3-30

RTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

EstimatedRTT = (1- )EstimatedRTT + SampleRTT exponential weighted moving average influence of past sample decreases

exponentially fast typical value = 0125

TCP round trip time timeout

RTT

(mill

iseco

nds)

RTT gaiacsumassedu to fantasiaeurecomfr

sampleRTT

EstimatedRTT

time (seconds)

Transport Layer 3-31

timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin

estimate SampleRTT deviation from EstimatedRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

TCP round trip time timeout

(typically = 025)

TimeoutInterval = EstimatedRTT + 4DevRTT

estimated RTT ldquosafety marginrdquo

Transport Layer 3-32

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-33

TCP reliable data transfer TCP creates rdt

service on top of IPrsquos unreliable service pipelined segments cumulative acks

bull selective acks often supported as an option

single retransmission timer

retransmissions triggered by timeout events duplicate acks

letrsquos initially consider simplified TCP sender ignore duplicate acks ignore flow control

congestion control

Transport Layer 3-34

TCP sender eventsdata rcvd from app create segment

with seq (= byte-stream number of first data byte in segment)

start timer if not already running (for oldest unacked segment) TimeOutInterval =

smoothed_RTT + 4deviation_RTT

timeout retransmit segment

that caused timeout restart timer ack rcvd if ack acknowledges

previously unacked segments update what is

known to be ACKed (re-)start timer if still

unacked segments

Transport Layer 3-35

TCP sender (simplified)

waitfor

event

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

create segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running) start timer

data received from application above

retransmit not-yet-acked segment with smallest seq start timer

timeout

if (y gt SendBase) SendBase = y SendBasendash1 last cumulatively ACKed byte if (there are currently not-yet-acked segments) (re-)start timer else stop timer

ACK received with ACK field value y

Transport Layer 3-36

TCP retransmission scenarios

lost ACK scenario

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8 bytes of data

Xtim

eo

ut

ACK=100

premature timeout

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8bytes of data

tim

eo

ut

ACK=120

Seq=100 20 bytes of data

ACK=120

SendBase=100

SendBase=120

SendBase=120

SendBase=92

Transport Layer 3-37

TCP retransmission scenarios

X

cumulative ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=120 15 bytes of data

tim

eo

ut

Seq=100 20 bytes of data

ACK=120

Transport Layer 3-38

TCP ACK generation [RFC 1122 RFC

2581]

event at receiver

arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

arrival of in-order segment withexpected seq One other segment has ACK pending

arrival of out-of-order segmenthigher-than-expect seq Gap detected

arrival of segment that partially or completely fills gap

TCP receiver action

delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

immediately send single cumulative ACK ACKing both in-order segments

immediately send duplicate ACK indicating seq of next expected byte

immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-39

TCP fast retransmit time-out period

often relatively long long delay before

resending lost packet detect lost

segments via duplicate ACKs sender often sends

many segments back-to-back

if segment is lost there will likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unacked segment with smallest seq

likely that unacked segment lost so donrsquot wait for timeout

TCP fast retransmit

(ldquotriple duplicate ACKsrdquo)

Transport Layer 3-40

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

tim

eo

ut ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer 3-41

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-42

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

application may remove data from

TCP socket buffers hellip

hellip slower than TCP

receiver is delivering(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-43

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size can be

set via socket options most operating systems

auto-adjust RcvBuffer sender limits amount

of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value to ensure receive buffer will not overflow

receiver-side buffering

Transport Layer 3-44

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-45

Connection Managementbefore exchanging data senderreceiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-46

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-47

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-48

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-49

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-50

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-51

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

TCP Overall state machine

Transport Layer 3-52

Transport Layer 3-53

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-54

congestion informally ldquotoo many sources sending

too much data too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Principles of congestion control

Transport Layer 3-55

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

output link capacity R

no retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

Transport Layer 3-56

one router finite buffers sender retransmission of timed-out packet

app-layer input = app-layer outputin = out

transport-layer input includes retransmissions rsquoin

ge in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

Causescosts of congestion scenario 2

Transport Layer 3-57

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-58

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-59

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-60

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-61

R2

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-62

four senders multihop paths timeoutretransmit

Q what happens as in and rsquo

in increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host CHost D

in original data

in original data plus

retransmitted data

A as red rsquoin increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-63

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

bandwidth used for that packet wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 18: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-18

Internet checksum exampleexample add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

Note when adding numbers a carryout from the most significant bit needs to be added to the

result

Q1 Sockets and multiplexing TCP uses more information in packet

headers in order to demultiplex packets compared to UDPA TrueB False

Transport Layer 3-19

Q2 Sockets UDP Suppose we use UDP instead of TCP under

HTTP for designing a web server where all requests and responses fit in a single packet Suppose a 100 clients are simultaneously communicating with this web server How many sockets are respectively at the server and at each clientA 11B 21C 2002D 1001E 101 1

Transport Layer 3-20

Q3 Sockets TCP

Suppose a 100 clients are simultaneously communicating with (a traditional HTTPTCP) web server How many sockets are respectively at the server and at each clientA 11B 21C 2002D 1001E 101 1

Transport Layer 3-21

Q4 Sockets TCP

Suppose a 100 clients are simultaneously communicating with (a traditional HTTPTCP) web server Do all of the sockets at the server have the same server-side port numberA YesB No

Transport Layer 3-22

Q5 UDP checksums Letrsquos denote a UDP packet as (checksum

data) ignoring other fields for this question Suppose a sender sends (0010 1110) and the receiver receives (00111110) Which of the following is true of the receiverA Thinks the packet is corrupted and

discards the packetB Thinks only the checksum is corrupted

and delivers the correct data to the application

C Can possibly conclude that nothing is wrong with the packet

D A and CTransport Layer 3-23

Transport Layer 3-24

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-25

TCP Overview RFCs 79311221323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size connection-oriented

handshaking (exchange of control msgs) inits sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver reliable in-order

byte steam no ldquomessage

boundariesrdquo pipelined

TCP congestion and flow control set window size

Transport Layer 3-26

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement number

receive window

Urg data pointerchecksum

FSRPAUheadlen

notused

options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 3-27

TCP seq numbers ACKssequence numbers

byte stream ldquonumberrdquo of first byte in segmentrsquos data

acknowledgementsseq of next byte expected from other side

cumulative ACKQ how receiver handles out-of-order segmentsA TCP spec doesnrsquot say - up to implementor

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

incoming segment to sender

A

sent ACKed

sent not-yet ACKed(ldquoin-flightrdquo)

usablebut not yet sent

not usable

window size N

sender sequence number space

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

outgoing segment from sender

Transport Layer 3-28

TCP seq numbers ACKs

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

simple telnet scenario

Host BHost A

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Transport Layer 3-29

TCP round trip time timeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short

premature timeout unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT

SampleRTT measured time from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several

recent measurements not just current SampleRTT

Transport Layer 3-30

RTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

EstimatedRTT = (1- )EstimatedRTT + SampleRTT exponential weighted moving average influence of past sample decreases

exponentially fast typical value = 0125

TCP round trip time timeout

RTT

(mill

iseco

nds)

RTT gaiacsumassedu to fantasiaeurecomfr

sampleRTT

EstimatedRTT

time (seconds)

Transport Layer 3-31

timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin

estimate SampleRTT deviation from EstimatedRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

TCP round trip time timeout

(typically = 025)

TimeoutInterval = EstimatedRTT + 4DevRTT

estimated RTT ldquosafety marginrdquo

Transport Layer 3-32

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-33

TCP reliable data transfer TCP creates rdt

service on top of IPrsquos unreliable service pipelined segments cumulative acks

bull selective acks often supported as an option

single retransmission timer

retransmissions triggered by timeout events duplicate acks

letrsquos initially consider simplified TCP sender ignore duplicate acks ignore flow control

congestion control

Transport Layer 3-34

TCP sender eventsdata rcvd from app create segment

with seq (= byte-stream number of first data byte in segment)

start timer if not already running (for oldest unacked segment) TimeOutInterval =

smoothed_RTT + 4deviation_RTT

timeout retransmit segment

that caused timeout restart timer ack rcvd if ack acknowledges

previously unacked segments update what is

known to be ACKed (re-)start timer if still

unacked segments

Transport Layer 3-35

TCP sender (simplified)

waitfor

event

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

create segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running) start timer

data received from application above

retransmit not-yet-acked segment with smallest seq start timer

timeout

if (y gt SendBase) SendBase = y SendBasendash1 last cumulatively ACKed byte if (there are currently not-yet-acked segments) (re-)start timer else stop timer

ACK received with ACK field value y

Transport Layer 3-36

TCP retransmission scenarios

lost ACK scenario

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8 bytes of data

Xtim

eo

ut

ACK=100

premature timeout

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8bytes of data

tim

eo

ut

ACK=120

Seq=100 20 bytes of data

ACK=120

SendBase=100

SendBase=120

SendBase=120

SendBase=92

Transport Layer 3-37

TCP retransmission scenarios

X

cumulative ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=120 15 bytes of data

tim

eo

ut

Seq=100 20 bytes of data

ACK=120

Transport Layer 3-38

TCP ACK generation [RFC 1122 RFC

2581]

event at receiver

arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

arrival of in-order segment withexpected seq One other segment has ACK pending

arrival of out-of-order segmenthigher-than-expect seq Gap detected

arrival of segment that partially or completely fills gap

TCP receiver action

delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

immediately send single cumulative ACK ACKing both in-order segments

immediately send duplicate ACK indicating seq of next expected byte

immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-39

TCP fast retransmit time-out period

often relatively long long delay before

resending lost packet detect lost

segments via duplicate ACKs sender often sends

many segments back-to-back

if segment is lost there will likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unacked segment with smallest seq

likely that unacked segment lost so donrsquot wait for timeout

TCP fast retransmit

(ldquotriple duplicate ACKsrdquo)

Transport Layer 3-40

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

tim

eo

ut ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer 3-41

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-42

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

application may remove data from

TCP socket buffers hellip

hellip slower than TCP

receiver is delivering(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-43

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size can be

set via socket options most operating systems

auto-adjust RcvBuffer sender limits amount

of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value to ensure receive buffer will not overflow

receiver-side buffering

Transport Layer 3-44

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-45

Connection Managementbefore exchanging data senderreceiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-46

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-47

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-48

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-49

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-50

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-51

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

TCP Overall state machine

Transport Layer 3-52

Transport Layer 3-53

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-54

congestion informally ldquotoo many sources sending

too much data too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Principles of congestion control

Transport Layer 3-55

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

output link capacity R

no retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

Transport Layer 3-56

one router finite buffers sender retransmission of timed-out packet

app-layer input = app-layer outputin = out

transport-layer input includes retransmissions rsquoin

ge in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

Causescosts of congestion scenario 2

Transport Layer 3-57

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-58

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-59

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-60

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-61

R2

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-62

four senders multihop paths timeoutretransmit

Q what happens as in and rsquo

in increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host CHost D

in original data

in original data plus

retransmitted data

A as red rsquoin increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-63

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

bandwidth used for that packet wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 19: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Q1 Sockets and multiplexing TCP uses more information in packet

headers in order to demultiplex packets compared to UDPA TrueB False

Transport Layer 3-19

Q2 Sockets UDP Suppose we use UDP instead of TCP under

HTTP for designing a web server where all requests and responses fit in a single packet Suppose a 100 clients are simultaneously communicating with this web server How many sockets are respectively at the server and at each clientA 11B 21C 2002D 1001E 101 1

Transport Layer 3-20

Q3 Sockets TCP

Suppose a 100 clients are simultaneously communicating with (a traditional HTTPTCP) web server How many sockets are respectively at the server and at each clientA 11B 21C 2002D 1001E 101 1

Transport Layer 3-21

Q4 Sockets TCP

Suppose a 100 clients are simultaneously communicating with (a traditional HTTPTCP) web server Do all of the sockets at the server have the same server-side port numberA YesB No

Transport Layer 3-22

Q5 UDP checksums Letrsquos denote a UDP packet as (checksum

data) ignoring other fields for this question Suppose a sender sends (0010 1110) and the receiver receives (00111110) Which of the following is true of the receiverA Thinks the packet is corrupted and

discards the packetB Thinks only the checksum is corrupted

and delivers the correct data to the application

C Can possibly conclude that nothing is wrong with the packet

D A and CTransport Layer 3-23

Transport Layer 3-24

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-25

TCP Overview RFCs 79311221323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size connection-oriented

handshaking (exchange of control msgs) inits sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver reliable in-order

byte steam no ldquomessage

boundariesrdquo pipelined

TCP congestion and flow control set window size

Transport Layer 3-26

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement number

receive window

Urg data pointerchecksum

FSRPAUheadlen

notused

options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 3-27

TCP seq numbers ACKssequence numbers

byte stream ldquonumberrdquo of first byte in segmentrsquos data

acknowledgementsseq of next byte expected from other side

cumulative ACKQ how receiver handles out-of-order segmentsA TCP spec doesnrsquot say - up to implementor

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

incoming segment to sender

A

sent ACKed

sent not-yet ACKed(ldquoin-flightrdquo)

usablebut not yet sent

not usable

window size N

sender sequence number space

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

outgoing segment from sender

Transport Layer 3-28

TCP seq numbers ACKs

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

simple telnet scenario

Host BHost A

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Transport Layer 3-29

TCP round trip time timeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short

premature timeout unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT

SampleRTT measured time from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several

recent measurements not just current SampleRTT

Transport Layer 3-30

RTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

EstimatedRTT = (1- )EstimatedRTT + SampleRTT exponential weighted moving average influence of past sample decreases

exponentially fast typical value = 0125

TCP round trip time timeout

RTT

(mill

iseco

nds)

RTT gaiacsumassedu to fantasiaeurecomfr

sampleRTT

EstimatedRTT

time (seconds)

Transport Layer 3-31

timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin

estimate SampleRTT deviation from EstimatedRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

TCP round trip time timeout

(typically = 025)

TimeoutInterval = EstimatedRTT + 4DevRTT

estimated RTT ldquosafety marginrdquo

Transport Layer 3-32

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-33

TCP reliable data transfer TCP creates rdt

service on top of IPrsquos unreliable service pipelined segments cumulative acks

bull selective acks often supported as an option

single retransmission timer

retransmissions triggered by timeout events duplicate acks

letrsquos initially consider simplified TCP sender ignore duplicate acks ignore flow control

congestion control

Transport Layer 3-34

TCP sender eventsdata rcvd from app create segment

with seq (= byte-stream number of first data byte in segment)

start timer if not already running (for oldest unacked segment) TimeOutInterval =

smoothed_RTT + 4deviation_RTT

timeout retransmit segment

that caused timeout restart timer ack rcvd if ack acknowledges

previously unacked segments update what is

known to be ACKed (re-)start timer if still

unacked segments

Transport Layer 3-35

TCP sender (simplified)

waitfor

event

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

create segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running) start timer

data received from application above

retransmit not-yet-acked segment with smallest seq start timer

timeout

if (y gt SendBase) SendBase = y SendBasendash1 last cumulatively ACKed byte if (there are currently not-yet-acked segments) (re-)start timer else stop timer

ACK received with ACK field value y

Transport Layer 3-36

TCP retransmission scenarios

lost ACK scenario

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8 bytes of data

Xtim

eo

ut

ACK=100

premature timeout

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8bytes of data

tim

eo

ut

ACK=120

Seq=100 20 bytes of data

ACK=120

SendBase=100

SendBase=120

SendBase=120

SendBase=92

Transport Layer 3-37

TCP retransmission scenarios

X

cumulative ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=120 15 bytes of data

tim

eo

ut

Seq=100 20 bytes of data

ACK=120

Transport Layer 3-38

TCP ACK generation [RFC 1122 RFC

2581]

event at receiver

arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

arrival of in-order segment withexpected seq One other segment has ACK pending

arrival of out-of-order segmenthigher-than-expect seq Gap detected

arrival of segment that partially or completely fills gap

TCP receiver action

delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

immediately send single cumulative ACK ACKing both in-order segments

immediately send duplicate ACK indicating seq of next expected byte

immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-39

TCP fast retransmit time-out period

often relatively long long delay before

resending lost packet detect lost

segments via duplicate ACKs sender often sends

many segments back-to-back

if segment is lost there will likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unacked segment with smallest seq

likely that unacked segment lost so donrsquot wait for timeout

TCP fast retransmit

(ldquotriple duplicate ACKsrdquo)

Transport Layer 3-40

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

tim

eo

ut ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer 3-41

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-42

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

application may remove data from

TCP socket buffers hellip

hellip slower than TCP

receiver is delivering(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-43

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size can be

set via socket options most operating systems

auto-adjust RcvBuffer sender limits amount

of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value to ensure receive buffer will not overflow

receiver-side buffering

Transport Layer 3-44

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-45

Connection Managementbefore exchanging data senderreceiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-46

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-47

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-48

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-49

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-50

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-51

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

TCP Overall state machine

Transport Layer 3-52

Transport Layer 3-53

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-54

congestion informally ldquotoo many sources sending

too much data too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Principles of congestion control

Transport Layer 3-55

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

output link capacity R

no retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

Transport Layer 3-56

one router finite buffers sender retransmission of timed-out packet

app-layer input = app-layer outputin = out

transport-layer input includes retransmissions rsquoin

ge in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

Causescosts of congestion scenario 2

Transport Layer 3-57

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-58

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-59

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-60

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-61

R2

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-62

four senders multihop paths timeoutretransmit

Q what happens as in and rsquo

in increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host CHost D

in original data

in original data plus

retransmitted data

A as red rsquoin increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-63

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

bandwidth used for that packet wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 20: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Q2 Sockets UDP Suppose we use UDP instead of TCP under

HTTP for designing a web server where all requests and responses fit in a single packet Suppose a 100 clients are simultaneously communicating with this web server How many sockets are respectively at the server and at each clientA 11B 21C 2002D 1001E 101 1

Transport Layer 3-20

Q3 Sockets TCP

Suppose a 100 clients are simultaneously communicating with (a traditional HTTPTCP) web server How many sockets are respectively at the server and at each clientA 11B 21C 2002D 1001E 101 1

Transport Layer 3-21

Q4 Sockets TCP

Suppose a 100 clients are simultaneously communicating with (a traditional HTTPTCP) web server Do all of the sockets at the server have the same server-side port numberA YesB No

Transport Layer 3-22

Q5 UDP checksums Letrsquos denote a UDP packet as (checksum

data) ignoring other fields for this question Suppose a sender sends (0010 1110) and the receiver receives (00111110) Which of the following is true of the receiverA Thinks the packet is corrupted and

discards the packetB Thinks only the checksum is corrupted

and delivers the correct data to the application

C Can possibly conclude that nothing is wrong with the packet

D A and CTransport Layer 3-23

Transport Layer 3-24

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-25

TCP Overview RFCs 79311221323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size connection-oriented

handshaking (exchange of control msgs) inits sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver reliable in-order

byte steam no ldquomessage

boundariesrdquo pipelined

TCP congestion and flow control set window size

Transport Layer 3-26

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement number

receive window

Urg data pointerchecksum

FSRPAUheadlen

notused

options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 3-27

TCP seq numbers ACKssequence numbers

byte stream ldquonumberrdquo of first byte in segmentrsquos data

acknowledgementsseq of next byte expected from other side

cumulative ACKQ how receiver handles out-of-order segmentsA TCP spec doesnrsquot say - up to implementor

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

incoming segment to sender

A

sent ACKed

sent not-yet ACKed(ldquoin-flightrdquo)

usablebut not yet sent

not usable

window size N

sender sequence number space

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

outgoing segment from sender

Transport Layer 3-28

TCP seq numbers ACKs

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

simple telnet scenario

Host BHost A

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Transport Layer 3-29

TCP round trip time timeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short

premature timeout unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT

SampleRTT measured time from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several

recent measurements not just current SampleRTT

Transport Layer 3-30

RTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

EstimatedRTT = (1- )EstimatedRTT + SampleRTT exponential weighted moving average influence of past sample decreases

exponentially fast typical value = 0125

TCP round trip time timeout

RTT

(mill

iseco

nds)

RTT gaiacsumassedu to fantasiaeurecomfr

sampleRTT

EstimatedRTT

time (seconds)

Transport Layer 3-31

timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin

estimate SampleRTT deviation from EstimatedRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

TCP round trip time timeout

(typically = 025)

TimeoutInterval = EstimatedRTT + 4DevRTT

estimated RTT ldquosafety marginrdquo

Transport Layer 3-32

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-33

TCP reliable data transfer TCP creates rdt

service on top of IPrsquos unreliable service pipelined segments cumulative acks

bull selective acks often supported as an option

single retransmission timer

retransmissions triggered by timeout events duplicate acks

letrsquos initially consider simplified TCP sender ignore duplicate acks ignore flow control

congestion control

Transport Layer 3-34

TCP sender eventsdata rcvd from app create segment

with seq (= byte-stream number of first data byte in segment)

start timer if not already running (for oldest unacked segment) TimeOutInterval =

smoothed_RTT + 4deviation_RTT

timeout retransmit segment

that caused timeout restart timer ack rcvd if ack acknowledges

previously unacked segments update what is

known to be ACKed (re-)start timer if still

unacked segments

Transport Layer 3-35

TCP sender (simplified)

waitfor

event

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

create segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running) start timer

data received from application above

retransmit not-yet-acked segment with smallest seq start timer

timeout

if (y gt SendBase) SendBase = y SendBasendash1 last cumulatively ACKed byte if (there are currently not-yet-acked segments) (re-)start timer else stop timer

ACK received with ACK field value y

Transport Layer 3-36

TCP retransmission scenarios

lost ACK scenario

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8 bytes of data

Xtim

eo

ut

ACK=100

premature timeout

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8bytes of data

tim

eo

ut

ACK=120

Seq=100 20 bytes of data

ACK=120

SendBase=100

SendBase=120

SendBase=120

SendBase=92

Transport Layer 3-37

TCP retransmission scenarios

X

cumulative ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=120 15 bytes of data

tim

eo

ut

Seq=100 20 bytes of data

ACK=120

Transport Layer 3-38

TCP ACK generation [RFC 1122 RFC

2581]

event at receiver

arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

arrival of in-order segment withexpected seq One other segment has ACK pending

arrival of out-of-order segmenthigher-than-expect seq Gap detected

arrival of segment that partially or completely fills gap

TCP receiver action

delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

immediately send single cumulative ACK ACKing both in-order segments

immediately send duplicate ACK indicating seq of next expected byte

immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-39

TCP fast retransmit time-out period

often relatively long long delay before

resending lost packet detect lost

segments via duplicate ACKs sender often sends

many segments back-to-back

if segment is lost there will likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unacked segment with smallest seq

likely that unacked segment lost so donrsquot wait for timeout

TCP fast retransmit

(ldquotriple duplicate ACKsrdquo)

Transport Layer 3-40

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

tim

eo

ut ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer 3-41

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-42

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

application may remove data from

TCP socket buffers hellip

hellip slower than TCP

receiver is delivering(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-43

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size can be

set via socket options most operating systems

auto-adjust RcvBuffer sender limits amount

of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value to ensure receive buffer will not overflow

receiver-side buffering

Transport Layer 3-44

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-45

Connection Managementbefore exchanging data senderreceiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-46

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-47

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-48

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-49

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-50

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-51

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

TCP Overall state machine

Transport Layer 3-52

Transport Layer 3-53

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-54

congestion informally ldquotoo many sources sending

too much data too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Principles of congestion control

Transport Layer 3-55

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

output link capacity R

no retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

Transport Layer 3-56

one router finite buffers sender retransmission of timed-out packet

app-layer input = app-layer outputin = out

transport-layer input includes retransmissions rsquoin

ge in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

Causescosts of congestion scenario 2

Transport Layer 3-57

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-58

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-59

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-60

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-61

R2

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-62

four senders multihop paths timeoutretransmit

Q what happens as in and rsquo

in increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host CHost D

in original data

in original data plus

retransmitted data

A as red rsquoin increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-63

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

bandwidth used for that packet wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 21: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Q3 Sockets TCP

Suppose a 100 clients are simultaneously communicating with (a traditional HTTPTCP) web server How many sockets are respectively at the server and at each clientA 11B 21C 2002D 1001E 101 1

Transport Layer 3-21

Q4 Sockets TCP

Suppose a 100 clients are simultaneously communicating with (a traditional HTTPTCP) web server Do all of the sockets at the server have the same server-side port numberA YesB No

Transport Layer 3-22

Q5 UDP checksums Letrsquos denote a UDP packet as (checksum

data) ignoring other fields for this question Suppose a sender sends (0010 1110) and the receiver receives (00111110) Which of the following is true of the receiverA Thinks the packet is corrupted and

discards the packetB Thinks only the checksum is corrupted

and delivers the correct data to the application

C Can possibly conclude that nothing is wrong with the packet

D A and CTransport Layer 3-23

Transport Layer 3-24

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-25

TCP Overview RFCs 79311221323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size connection-oriented

handshaking (exchange of control msgs) inits sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver reliable in-order

byte steam no ldquomessage

boundariesrdquo pipelined

TCP congestion and flow control set window size

Transport Layer 3-26

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement number

receive window

Urg data pointerchecksum

FSRPAUheadlen

notused

options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 3-27

TCP seq numbers ACKssequence numbers

byte stream ldquonumberrdquo of first byte in segmentrsquos data

acknowledgementsseq of next byte expected from other side

cumulative ACKQ how receiver handles out-of-order segmentsA TCP spec doesnrsquot say - up to implementor

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

incoming segment to sender

A

sent ACKed

sent not-yet ACKed(ldquoin-flightrdquo)

usablebut not yet sent

not usable

window size N

sender sequence number space

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

outgoing segment from sender

Transport Layer 3-28

TCP seq numbers ACKs

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

simple telnet scenario

Host BHost A

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Transport Layer 3-29

TCP round trip time timeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short

premature timeout unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT

SampleRTT measured time from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several

recent measurements not just current SampleRTT

Transport Layer 3-30

RTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

EstimatedRTT = (1- )EstimatedRTT + SampleRTT exponential weighted moving average influence of past sample decreases

exponentially fast typical value = 0125

TCP round trip time timeout

RTT

(mill

iseco

nds)

RTT gaiacsumassedu to fantasiaeurecomfr

sampleRTT

EstimatedRTT

time (seconds)

Transport Layer 3-31

timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin

estimate SampleRTT deviation from EstimatedRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

TCP round trip time timeout

(typically = 025)

TimeoutInterval = EstimatedRTT + 4DevRTT

estimated RTT ldquosafety marginrdquo

Transport Layer 3-32

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-33

TCP reliable data transfer TCP creates rdt

service on top of IPrsquos unreliable service pipelined segments cumulative acks

bull selective acks often supported as an option

single retransmission timer

retransmissions triggered by timeout events duplicate acks

letrsquos initially consider simplified TCP sender ignore duplicate acks ignore flow control

congestion control

Transport Layer 3-34

TCP sender eventsdata rcvd from app create segment

with seq (= byte-stream number of first data byte in segment)

start timer if not already running (for oldest unacked segment) TimeOutInterval =

smoothed_RTT + 4deviation_RTT

timeout retransmit segment

that caused timeout restart timer ack rcvd if ack acknowledges

previously unacked segments update what is

known to be ACKed (re-)start timer if still

unacked segments

Transport Layer 3-35

TCP sender (simplified)

waitfor

event

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

create segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running) start timer

data received from application above

retransmit not-yet-acked segment with smallest seq start timer

timeout

if (y gt SendBase) SendBase = y SendBasendash1 last cumulatively ACKed byte if (there are currently not-yet-acked segments) (re-)start timer else stop timer

ACK received with ACK field value y

Transport Layer 3-36

TCP retransmission scenarios

lost ACK scenario

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8 bytes of data

Xtim

eo

ut

ACK=100

premature timeout

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8bytes of data

tim

eo

ut

ACK=120

Seq=100 20 bytes of data

ACK=120

SendBase=100

SendBase=120

SendBase=120

SendBase=92

Transport Layer 3-37

TCP retransmission scenarios

X

cumulative ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=120 15 bytes of data

tim

eo

ut

Seq=100 20 bytes of data

ACK=120

Transport Layer 3-38

TCP ACK generation [RFC 1122 RFC

2581]

event at receiver

arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

arrival of in-order segment withexpected seq One other segment has ACK pending

arrival of out-of-order segmenthigher-than-expect seq Gap detected

arrival of segment that partially or completely fills gap

TCP receiver action

delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

immediately send single cumulative ACK ACKing both in-order segments

immediately send duplicate ACK indicating seq of next expected byte

immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-39

TCP fast retransmit time-out period

often relatively long long delay before

resending lost packet detect lost

segments via duplicate ACKs sender often sends

many segments back-to-back

if segment is lost there will likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unacked segment with smallest seq

likely that unacked segment lost so donrsquot wait for timeout

TCP fast retransmit

(ldquotriple duplicate ACKsrdquo)

Transport Layer 3-40

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

tim

eo

ut ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer 3-41

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-42

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

application may remove data from

TCP socket buffers hellip

hellip slower than TCP

receiver is delivering(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-43

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size can be

set via socket options most operating systems

auto-adjust RcvBuffer sender limits amount

of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value to ensure receive buffer will not overflow

receiver-side buffering

Transport Layer 3-44

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-45

Connection Managementbefore exchanging data senderreceiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-46

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-47

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-48

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-49

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-50

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-51

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

TCP Overall state machine

Transport Layer 3-52

Transport Layer 3-53

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-54

congestion informally ldquotoo many sources sending

too much data too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Principles of congestion control

Transport Layer 3-55

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

output link capacity R

no retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

Transport Layer 3-56

one router finite buffers sender retransmission of timed-out packet

app-layer input = app-layer outputin = out

transport-layer input includes retransmissions rsquoin

ge in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

Causescosts of congestion scenario 2

Transport Layer 3-57

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-58

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-59

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-60

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-61

R2

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-62

four senders multihop paths timeoutretransmit

Q what happens as in and rsquo

in increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host CHost D

in original data

in original data plus

retransmitted data

A as red rsquoin increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-63

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

bandwidth used for that packet wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 22: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Q4 Sockets TCP

Suppose a 100 clients are simultaneously communicating with (a traditional HTTPTCP) web server Do all of the sockets at the server have the same server-side port numberA YesB No

Transport Layer 3-22

Q5 UDP checksums Letrsquos denote a UDP packet as (checksum

data) ignoring other fields for this question Suppose a sender sends (0010 1110) and the receiver receives (00111110) Which of the following is true of the receiverA Thinks the packet is corrupted and

discards the packetB Thinks only the checksum is corrupted

and delivers the correct data to the application

C Can possibly conclude that nothing is wrong with the packet

D A and CTransport Layer 3-23

Transport Layer 3-24

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-25

TCP Overview RFCs 79311221323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size connection-oriented

handshaking (exchange of control msgs) inits sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver reliable in-order

byte steam no ldquomessage

boundariesrdquo pipelined

TCP congestion and flow control set window size

Transport Layer 3-26

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement number

receive window

Urg data pointerchecksum

FSRPAUheadlen

notused

options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 3-27

TCP seq numbers ACKssequence numbers

byte stream ldquonumberrdquo of first byte in segmentrsquos data

acknowledgementsseq of next byte expected from other side

cumulative ACKQ how receiver handles out-of-order segmentsA TCP spec doesnrsquot say - up to implementor

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

incoming segment to sender

A

sent ACKed

sent not-yet ACKed(ldquoin-flightrdquo)

usablebut not yet sent

not usable

window size N

sender sequence number space

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

outgoing segment from sender

Transport Layer 3-28

TCP seq numbers ACKs

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

simple telnet scenario

Host BHost A

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Transport Layer 3-29

TCP round trip time timeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short

premature timeout unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT

SampleRTT measured time from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several

recent measurements not just current SampleRTT

Transport Layer 3-30

RTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

EstimatedRTT = (1- )EstimatedRTT + SampleRTT exponential weighted moving average influence of past sample decreases

exponentially fast typical value = 0125

TCP round trip time timeout

RTT

(mill

iseco

nds)

RTT gaiacsumassedu to fantasiaeurecomfr

sampleRTT

EstimatedRTT

time (seconds)

Transport Layer 3-31

timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin

estimate SampleRTT deviation from EstimatedRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

TCP round trip time timeout

(typically = 025)

TimeoutInterval = EstimatedRTT + 4DevRTT

estimated RTT ldquosafety marginrdquo

Transport Layer 3-32

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-33

TCP reliable data transfer TCP creates rdt

service on top of IPrsquos unreliable service pipelined segments cumulative acks

bull selective acks often supported as an option

single retransmission timer

retransmissions triggered by timeout events duplicate acks

letrsquos initially consider simplified TCP sender ignore duplicate acks ignore flow control

congestion control

Transport Layer 3-34

TCP sender eventsdata rcvd from app create segment

with seq (= byte-stream number of first data byte in segment)

start timer if not already running (for oldest unacked segment) TimeOutInterval =

smoothed_RTT + 4deviation_RTT

timeout retransmit segment

that caused timeout restart timer ack rcvd if ack acknowledges

previously unacked segments update what is

known to be ACKed (re-)start timer if still

unacked segments

Transport Layer 3-35

TCP sender (simplified)

waitfor

event

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

create segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running) start timer

data received from application above

retransmit not-yet-acked segment with smallest seq start timer

timeout

if (y gt SendBase) SendBase = y SendBasendash1 last cumulatively ACKed byte if (there are currently not-yet-acked segments) (re-)start timer else stop timer

ACK received with ACK field value y

Transport Layer 3-36

TCP retransmission scenarios

lost ACK scenario

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8 bytes of data

Xtim

eo

ut

ACK=100

premature timeout

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8bytes of data

tim

eo

ut

ACK=120

Seq=100 20 bytes of data

ACK=120

SendBase=100

SendBase=120

SendBase=120

SendBase=92

Transport Layer 3-37

TCP retransmission scenarios

X

cumulative ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=120 15 bytes of data

tim

eo

ut

Seq=100 20 bytes of data

ACK=120

Transport Layer 3-38

TCP ACK generation [RFC 1122 RFC

2581]

event at receiver

arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

arrival of in-order segment withexpected seq One other segment has ACK pending

arrival of out-of-order segmenthigher-than-expect seq Gap detected

arrival of segment that partially or completely fills gap

TCP receiver action

delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

immediately send single cumulative ACK ACKing both in-order segments

immediately send duplicate ACK indicating seq of next expected byte

immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-39

TCP fast retransmit time-out period

often relatively long long delay before

resending lost packet detect lost

segments via duplicate ACKs sender often sends

many segments back-to-back

if segment is lost there will likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unacked segment with smallest seq

likely that unacked segment lost so donrsquot wait for timeout

TCP fast retransmit

(ldquotriple duplicate ACKsrdquo)

Transport Layer 3-40

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

tim

eo

ut ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer 3-41

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-42

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

application may remove data from

TCP socket buffers hellip

hellip slower than TCP

receiver is delivering(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-43

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size can be

set via socket options most operating systems

auto-adjust RcvBuffer sender limits amount

of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value to ensure receive buffer will not overflow

receiver-side buffering

Transport Layer 3-44

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-45

Connection Managementbefore exchanging data senderreceiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-46

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-47

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-48

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-49

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-50

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-51

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

TCP Overall state machine

Transport Layer 3-52

Transport Layer 3-53

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-54

congestion informally ldquotoo many sources sending

too much data too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Principles of congestion control

Transport Layer 3-55

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

output link capacity R

no retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

Transport Layer 3-56

one router finite buffers sender retransmission of timed-out packet

app-layer input = app-layer outputin = out

transport-layer input includes retransmissions rsquoin

ge in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

Causescosts of congestion scenario 2

Transport Layer 3-57

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-58

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-59

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-60

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-61

R2

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-62

four senders multihop paths timeoutretransmit

Q what happens as in and rsquo

in increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host CHost D

in original data

in original data plus

retransmitted data

A as red rsquoin increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-63

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

bandwidth used for that packet wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 23: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Q5 UDP checksums Letrsquos denote a UDP packet as (checksum

data) ignoring other fields for this question Suppose a sender sends (0010 1110) and the receiver receives (00111110) Which of the following is true of the receiverA Thinks the packet is corrupted and

discards the packetB Thinks only the checksum is corrupted

and delivers the correct data to the application

C Can possibly conclude that nothing is wrong with the packet

D A and CTransport Layer 3-23

Transport Layer 3-24

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-25

TCP Overview RFCs 79311221323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size connection-oriented

handshaking (exchange of control msgs) inits sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver reliable in-order

byte steam no ldquomessage

boundariesrdquo pipelined

TCP congestion and flow control set window size

Transport Layer 3-26

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement number

receive window

Urg data pointerchecksum

FSRPAUheadlen

notused

options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 3-27

TCP seq numbers ACKssequence numbers

byte stream ldquonumberrdquo of first byte in segmentrsquos data

acknowledgementsseq of next byte expected from other side

cumulative ACKQ how receiver handles out-of-order segmentsA TCP spec doesnrsquot say - up to implementor

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

incoming segment to sender

A

sent ACKed

sent not-yet ACKed(ldquoin-flightrdquo)

usablebut not yet sent

not usable

window size N

sender sequence number space

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

outgoing segment from sender

Transport Layer 3-28

TCP seq numbers ACKs

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

simple telnet scenario

Host BHost A

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Transport Layer 3-29

TCP round trip time timeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short

premature timeout unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT

SampleRTT measured time from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several

recent measurements not just current SampleRTT

Transport Layer 3-30

RTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

EstimatedRTT = (1- )EstimatedRTT + SampleRTT exponential weighted moving average influence of past sample decreases

exponentially fast typical value = 0125

TCP round trip time timeout

RTT

(mill

iseco

nds)

RTT gaiacsumassedu to fantasiaeurecomfr

sampleRTT

EstimatedRTT

time (seconds)

Transport Layer 3-31

timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin

estimate SampleRTT deviation from EstimatedRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

TCP round trip time timeout

(typically = 025)

TimeoutInterval = EstimatedRTT + 4DevRTT

estimated RTT ldquosafety marginrdquo

Transport Layer 3-32

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-33

TCP reliable data transfer TCP creates rdt

service on top of IPrsquos unreliable service pipelined segments cumulative acks

bull selective acks often supported as an option

single retransmission timer

retransmissions triggered by timeout events duplicate acks

letrsquos initially consider simplified TCP sender ignore duplicate acks ignore flow control

congestion control

Transport Layer 3-34

TCP sender eventsdata rcvd from app create segment

with seq (= byte-stream number of first data byte in segment)

start timer if not already running (for oldest unacked segment) TimeOutInterval =

smoothed_RTT + 4deviation_RTT

timeout retransmit segment

that caused timeout restart timer ack rcvd if ack acknowledges

previously unacked segments update what is

known to be ACKed (re-)start timer if still

unacked segments

Transport Layer 3-35

TCP sender (simplified)

waitfor

event

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

create segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running) start timer

data received from application above

retransmit not-yet-acked segment with smallest seq start timer

timeout

if (y gt SendBase) SendBase = y SendBasendash1 last cumulatively ACKed byte if (there are currently not-yet-acked segments) (re-)start timer else stop timer

ACK received with ACK field value y

Transport Layer 3-36

TCP retransmission scenarios

lost ACK scenario

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8 bytes of data

Xtim

eo

ut

ACK=100

premature timeout

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8bytes of data

tim

eo

ut

ACK=120

Seq=100 20 bytes of data

ACK=120

SendBase=100

SendBase=120

SendBase=120

SendBase=92

Transport Layer 3-37

TCP retransmission scenarios

X

cumulative ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=120 15 bytes of data

tim

eo

ut

Seq=100 20 bytes of data

ACK=120

Transport Layer 3-38

TCP ACK generation [RFC 1122 RFC

2581]

event at receiver

arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

arrival of in-order segment withexpected seq One other segment has ACK pending

arrival of out-of-order segmenthigher-than-expect seq Gap detected

arrival of segment that partially or completely fills gap

TCP receiver action

delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

immediately send single cumulative ACK ACKing both in-order segments

immediately send duplicate ACK indicating seq of next expected byte

immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-39

TCP fast retransmit time-out period

often relatively long long delay before

resending lost packet detect lost

segments via duplicate ACKs sender often sends

many segments back-to-back

if segment is lost there will likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unacked segment with smallest seq

likely that unacked segment lost so donrsquot wait for timeout

TCP fast retransmit

(ldquotriple duplicate ACKsrdquo)

Transport Layer 3-40

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

tim

eo

ut ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer 3-41

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-42

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

application may remove data from

TCP socket buffers hellip

hellip slower than TCP

receiver is delivering(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-43

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size can be

set via socket options most operating systems

auto-adjust RcvBuffer sender limits amount

of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value to ensure receive buffer will not overflow

receiver-side buffering

Transport Layer 3-44

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-45

Connection Managementbefore exchanging data senderreceiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-46

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-47

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-48

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-49

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-50

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-51

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

TCP Overall state machine

Transport Layer 3-52

Transport Layer 3-53

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-54

congestion informally ldquotoo many sources sending

too much data too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Principles of congestion control

Transport Layer 3-55

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

output link capacity R

no retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

Transport Layer 3-56

one router finite buffers sender retransmission of timed-out packet

app-layer input = app-layer outputin = out

transport-layer input includes retransmissions rsquoin

ge in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

Causescosts of congestion scenario 2

Transport Layer 3-57

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-58

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-59

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-60

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-61

R2

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-62

four senders multihop paths timeoutretransmit

Q what happens as in and rsquo

in increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host CHost D

in original data

in original data plus

retransmitted data

A as red rsquoin increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-63

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

bandwidth used for that packet wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 24: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-24

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-25

TCP Overview RFCs 79311221323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size connection-oriented

handshaking (exchange of control msgs) inits sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver reliable in-order

byte steam no ldquomessage

boundariesrdquo pipelined

TCP congestion and flow control set window size

Transport Layer 3-26

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement number

receive window

Urg data pointerchecksum

FSRPAUheadlen

notused

options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 3-27

TCP seq numbers ACKssequence numbers

byte stream ldquonumberrdquo of first byte in segmentrsquos data

acknowledgementsseq of next byte expected from other side

cumulative ACKQ how receiver handles out-of-order segmentsA TCP spec doesnrsquot say - up to implementor

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

incoming segment to sender

A

sent ACKed

sent not-yet ACKed(ldquoin-flightrdquo)

usablebut not yet sent

not usable

window size N

sender sequence number space

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

outgoing segment from sender

Transport Layer 3-28

TCP seq numbers ACKs

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

simple telnet scenario

Host BHost A

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Transport Layer 3-29

TCP round trip time timeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short

premature timeout unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT

SampleRTT measured time from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several

recent measurements not just current SampleRTT

Transport Layer 3-30

RTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

EstimatedRTT = (1- )EstimatedRTT + SampleRTT exponential weighted moving average influence of past sample decreases

exponentially fast typical value = 0125

TCP round trip time timeout

RTT

(mill

iseco

nds)

RTT gaiacsumassedu to fantasiaeurecomfr

sampleRTT

EstimatedRTT

time (seconds)

Transport Layer 3-31

timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin

estimate SampleRTT deviation from EstimatedRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

TCP round trip time timeout

(typically = 025)

TimeoutInterval = EstimatedRTT + 4DevRTT

estimated RTT ldquosafety marginrdquo

Transport Layer 3-32

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-33

TCP reliable data transfer TCP creates rdt

service on top of IPrsquos unreliable service pipelined segments cumulative acks

bull selective acks often supported as an option

single retransmission timer

retransmissions triggered by timeout events duplicate acks

letrsquos initially consider simplified TCP sender ignore duplicate acks ignore flow control

congestion control

Transport Layer 3-34

TCP sender eventsdata rcvd from app create segment

with seq (= byte-stream number of first data byte in segment)

start timer if not already running (for oldest unacked segment) TimeOutInterval =

smoothed_RTT + 4deviation_RTT

timeout retransmit segment

that caused timeout restart timer ack rcvd if ack acknowledges

previously unacked segments update what is

known to be ACKed (re-)start timer if still

unacked segments

Transport Layer 3-35

TCP sender (simplified)

waitfor

event

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

create segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running) start timer

data received from application above

retransmit not-yet-acked segment with smallest seq start timer

timeout

if (y gt SendBase) SendBase = y SendBasendash1 last cumulatively ACKed byte if (there are currently not-yet-acked segments) (re-)start timer else stop timer

ACK received with ACK field value y

Transport Layer 3-36

TCP retransmission scenarios

lost ACK scenario

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8 bytes of data

Xtim

eo

ut

ACK=100

premature timeout

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8bytes of data

tim

eo

ut

ACK=120

Seq=100 20 bytes of data

ACK=120

SendBase=100

SendBase=120

SendBase=120

SendBase=92

Transport Layer 3-37

TCP retransmission scenarios

X

cumulative ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=120 15 bytes of data

tim

eo

ut

Seq=100 20 bytes of data

ACK=120

Transport Layer 3-38

TCP ACK generation [RFC 1122 RFC

2581]

event at receiver

arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

arrival of in-order segment withexpected seq One other segment has ACK pending

arrival of out-of-order segmenthigher-than-expect seq Gap detected

arrival of segment that partially or completely fills gap

TCP receiver action

delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

immediately send single cumulative ACK ACKing both in-order segments

immediately send duplicate ACK indicating seq of next expected byte

immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-39

TCP fast retransmit time-out period

often relatively long long delay before

resending lost packet detect lost

segments via duplicate ACKs sender often sends

many segments back-to-back

if segment is lost there will likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unacked segment with smallest seq

likely that unacked segment lost so donrsquot wait for timeout

TCP fast retransmit

(ldquotriple duplicate ACKsrdquo)

Transport Layer 3-40

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

tim

eo

ut ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer 3-41

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-42

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

application may remove data from

TCP socket buffers hellip

hellip slower than TCP

receiver is delivering(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-43

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size can be

set via socket options most operating systems

auto-adjust RcvBuffer sender limits amount

of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value to ensure receive buffer will not overflow

receiver-side buffering

Transport Layer 3-44

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-45

Connection Managementbefore exchanging data senderreceiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-46

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-47

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-48

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-49

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-50

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-51

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

TCP Overall state machine

Transport Layer 3-52

Transport Layer 3-53

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-54

congestion informally ldquotoo many sources sending

too much data too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Principles of congestion control

Transport Layer 3-55

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

output link capacity R

no retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

Transport Layer 3-56

one router finite buffers sender retransmission of timed-out packet

app-layer input = app-layer outputin = out

transport-layer input includes retransmissions rsquoin

ge in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

Causescosts of congestion scenario 2

Transport Layer 3-57

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-58

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-59

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-60

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-61

R2

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-62

four senders multihop paths timeoutretransmit

Q what happens as in and rsquo

in increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host CHost D

in original data

in original data plus

retransmitted data

A as red rsquoin increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-63

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

bandwidth used for that packet wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 25: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-25

TCP Overview RFCs 79311221323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size connection-oriented

handshaking (exchange of control msgs) inits sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver reliable in-order

byte steam no ldquomessage

boundariesrdquo pipelined

TCP congestion and flow control set window size

Transport Layer 3-26

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement number

receive window

Urg data pointerchecksum

FSRPAUheadlen

notused

options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 3-27

TCP seq numbers ACKssequence numbers

byte stream ldquonumberrdquo of first byte in segmentrsquos data

acknowledgementsseq of next byte expected from other side

cumulative ACKQ how receiver handles out-of-order segmentsA TCP spec doesnrsquot say - up to implementor

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

incoming segment to sender

A

sent ACKed

sent not-yet ACKed(ldquoin-flightrdquo)

usablebut not yet sent

not usable

window size N

sender sequence number space

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

outgoing segment from sender

Transport Layer 3-28

TCP seq numbers ACKs

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

simple telnet scenario

Host BHost A

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Transport Layer 3-29

TCP round trip time timeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short

premature timeout unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT

SampleRTT measured time from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several

recent measurements not just current SampleRTT

Transport Layer 3-30

RTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

EstimatedRTT = (1- )EstimatedRTT + SampleRTT exponential weighted moving average influence of past sample decreases

exponentially fast typical value = 0125

TCP round trip time timeout

RTT

(mill

iseco

nds)

RTT gaiacsumassedu to fantasiaeurecomfr

sampleRTT

EstimatedRTT

time (seconds)

Transport Layer 3-31

timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin

estimate SampleRTT deviation from EstimatedRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

TCP round trip time timeout

(typically = 025)

TimeoutInterval = EstimatedRTT + 4DevRTT

estimated RTT ldquosafety marginrdquo

Transport Layer 3-32

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-33

TCP reliable data transfer TCP creates rdt

service on top of IPrsquos unreliable service pipelined segments cumulative acks

bull selective acks often supported as an option

single retransmission timer

retransmissions triggered by timeout events duplicate acks

letrsquos initially consider simplified TCP sender ignore duplicate acks ignore flow control

congestion control

Transport Layer 3-34

TCP sender eventsdata rcvd from app create segment

with seq (= byte-stream number of first data byte in segment)

start timer if not already running (for oldest unacked segment) TimeOutInterval =

smoothed_RTT + 4deviation_RTT

timeout retransmit segment

that caused timeout restart timer ack rcvd if ack acknowledges

previously unacked segments update what is

known to be ACKed (re-)start timer if still

unacked segments

Transport Layer 3-35

TCP sender (simplified)

waitfor

event

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

create segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running) start timer

data received from application above

retransmit not-yet-acked segment with smallest seq start timer

timeout

if (y gt SendBase) SendBase = y SendBasendash1 last cumulatively ACKed byte if (there are currently not-yet-acked segments) (re-)start timer else stop timer

ACK received with ACK field value y

Transport Layer 3-36

TCP retransmission scenarios

lost ACK scenario

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8 bytes of data

Xtim

eo

ut

ACK=100

premature timeout

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8bytes of data

tim

eo

ut

ACK=120

Seq=100 20 bytes of data

ACK=120

SendBase=100

SendBase=120

SendBase=120

SendBase=92

Transport Layer 3-37

TCP retransmission scenarios

X

cumulative ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=120 15 bytes of data

tim

eo

ut

Seq=100 20 bytes of data

ACK=120

Transport Layer 3-38

TCP ACK generation [RFC 1122 RFC

2581]

event at receiver

arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

arrival of in-order segment withexpected seq One other segment has ACK pending

arrival of out-of-order segmenthigher-than-expect seq Gap detected

arrival of segment that partially or completely fills gap

TCP receiver action

delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

immediately send single cumulative ACK ACKing both in-order segments

immediately send duplicate ACK indicating seq of next expected byte

immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-39

TCP fast retransmit time-out period

often relatively long long delay before

resending lost packet detect lost

segments via duplicate ACKs sender often sends

many segments back-to-back

if segment is lost there will likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unacked segment with smallest seq

likely that unacked segment lost so donrsquot wait for timeout

TCP fast retransmit

(ldquotriple duplicate ACKsrdquo)

Transport Layer 3-40

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

tim

eo

ut ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer 3-41

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-42

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

application may remove data from

TCP socket buffers hellip

hellip slower than TCP

receiver is delivering(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-43

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size can be

set via socket options most operating systems

auto-adjust RcvBuffer sender limits amount

of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value to ensure receive buffer will not overflow

receiver-side buffering

Transport Layer 3-44

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-45

Connection Managementbefore exchanging data senderreceiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-46

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-47

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-48

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-49

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-50

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-51

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

TCP Overall state machine

Transport Layer 3-52

Transport Layer 3-53

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-54

congestion informally ldquotoo many sources sending

too much data too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Principles of congestion control

Transport Layer 3-55

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

output link capacity R

no retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

Transport Layer 3-56

one router finite buffers sender retransmission of timed-out packet

app-layer input = app-layer outputin = out

transport-layer input includes retransmissions rsquoin

ge in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

Causescosts of congestion scenario 2

Transport Layer 3-57

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-58

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-59

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-60

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-61

R2

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-62

four senders multihop paths timeoutretransmit

Q what happens as in and rsquo

in increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host CHost D

in original data

in original data plus

retransmitted data

A as red rsquoin increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-63

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

bandwidth used for that packet wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 26: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-26

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement number

receive window

Urg data pointerchecksum

FSRPAUheadlen

notused

options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 3-27

TCP seq numbers ACKssequence numbers

byte stream ldquonumberrdquo of first byte in segmentrsquos data

acknowledgementsseq of next byte expected from other side

cumulative ACKQ how receiver handles out-of-order segmentsA TCP spec doesnrsquot say - up to implementor

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

incoming segment to sender

A

sent ACKed

sent not-yet ACKed(ldquoin-flightrdquo)

usablebut not yet sent

not usable

window size N

sender sequence number space

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

outgoing segment from sender

Transport Layer 3-28

TCP seq numbers ACKs

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

simple telnet scenario

Host BHost A

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Transport Layer 3-29

TCP round trip time timeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short

premature timeout unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT

SampleRTT measured time from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several

recent measurements not just current SampleRTT

Transport Layer 3-30

RTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

EstimatedRTT = (1- )EstimatedRTT + SampleRTT exponential weighted moving average influence of past sample decreases

exponentially fast typical value = 0125

TCP round trip time timeout

RTT

(mill

iseco

nds)

RTT gaiacsumassedu to fantasiaeurecomfr

sampleRTT

EstimatedRTT

time (seconds)

Transport Layer 3-31

timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin

estimate SampleRTT deviation from EstimatedRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

TCP round trip time timeout

(typically = 025)

TimeoutInterval = EstimatedRTT + 4DevRTT

estimated RTT ldquosafety marginrdquo

Transport Layer 3-32

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-33

TCP reliable data transfer TCP creates rdt

service on top of IPrsquos unreliable service pipelined segments cumulative acks

bull selective acks often supported as an option

single retransmission timer

retransmissions triggered by timeout events duplicate acks

letrsquos initially consider simplified TCP sender ignore duplicate acks ignore flow control

congestion control

Transport Layer 3-34

TCP sender eventsdata rcvd from app create segment

with seq (= byte-stream number of first data byte in segment)

start timer if not already running (for oldest unacked segment) TimeOutInterval =

smoothed_RTT + 4deviation_RTT

timeout retransmit segment

that caused timeout restart timer ack rcvd if ack acknowledges

previously unacked segments update what is

known to be ACKed (re-)start timer if still

unacked segments

Transport Layer 3-35

TCP sender (simplified)

waitfor

event

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

create segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running) start timer

data received from application above

retransmit not-yet-acked segment with smallest seq start timer

timeout

if (y gt SendBase) SendBase = y SendBasendash1 last cumulatively ACKed byte if (there are currently not-yet-acked segments) (re-)start timer else stop timer

ACK received with ACK field value y

Transport Layer 3-36

TCP retransmission scenarios

lost ACK scenario

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8 bytes of data

Xtim

eo

ut

ACK=100

premature timeout

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8bytes of data

tim

eo

ut

ACK=120

Seq=100 20 bytes of data

ACK=120

SendBase=100

SendBase=120

SendBase=120

SendBase=92

Transport Layer 3-37

TCP retransmission scenarios

X

cumulative ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=120 15 bytes of data

tim

eo

ut

Seq=100 20 bytes of data

ACK=120

Transport Layer 3-38

TCP ACK generation [RFC 1122 RFC

2581]

event at receiver

arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

arrival of in-order segment withexpected seq One other segment has ACK pending

arrival of out-of-order segmenthigher-than-expect seq Gap detected

arrival of segment that partially or completely fills gap

TCP receiver action

delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

immediately send single cumulative ACK ACKing both in-order segments

immediately send duplicate ACK indicating seq of next expected byte

immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-39

TCP fast retransmit time-out period

often relatively long long delay before

resending lost packet detect lost

segments via duplicate ACKs sender often sends

many segments back-to-back

if segment is lost there will likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unacked segment with smallest seq

likely that unacked segment lost so donrsquot wait for timeout

TCP fast retransmit

(ldquotriple duplicate ACKsrdquo)

Transport Layer 3-40

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

tim

eo

ut ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer 3-41

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-42

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

application may remove data from

TCP socket buffers hellip

hellip slower than TCP

receiver is delivering(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-43

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size can be

set via socket options most operating systems

auto-adjust RcvBuffer sender limits amount

of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value to ensure receive buffer will not overflow

receiver-side buffering

Transport Layer 3-44

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-45

Connection Managementbefore exchanging data senderreceiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-46

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-47

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-48

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-49

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-50

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-51

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

TCP Overall state machine

Transport Layer 3-52

Transport Layer 3-53

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-54

congestion informally ldquotoo many sources sending

too much data too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Principles of congestion control

Transport Layer 3-55

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

output link capacity R

no retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

Transport Layer 3-56

one router finite buffers sender retransmission of timed-out packet

app-layer input = app-layer outputin = out

transport-layer input includes retransmissions rsquoin

ge in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

Causescosts of congestion scenario 2

Transport Layer 3-57

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-58

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-59

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-60

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-61

R2

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-62

four senders multihop paths timeoutretransmit

Q what happens as in and rsquo

in increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host CHost D

in original data

in original data plus

retransmitted data

A as red rsquoin increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-63

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

bandwidth used for that packet wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 27: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-27

TCP seq numbers ACKssequence numbers

byte stream ldquonumberrdquo of first byte in segmentrsquos data

acknowledgementsseq of next byte expected from other side

cumulative ACKQ how receiver handles out-of-order segmentsA TCP spec doesnrsquot say - up to implementor

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

incoming segment to sender

A

sent ACKed

sent not-yet ACKed(ldquoin-flightrdquo)

usablebut not yet sent

not usable

window size N

sender sequence number space

source port dest port

sequence number

acknowledgement number

checksum

rwnd

urg pointer

outgoing segment from sender

Transport Layer 3-28

TCP seq numbers ACKs

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

simple telnet scenario

Host BHost A

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Transport Layer 3-29

TCP round trip time timeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short

premature timeout unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT

SampleRTT measured time from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several

recent measurements not just current SampleRTT

Transport Layer 3-30

RTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

EstimatedRTT = (1- )EstimatedRTT + SampleRTT exponential weighted moving average influence of past sample decreases

exponentially fast typical value = 0125

TCP round trip time timeout

RTT

(mill

iseco

nds)

RTT gaiacsumassedu to fantasiaeurecomfr

sampleRTT

EstimatedRTT

time (seconds)

Transport Layer 3-31

timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin

estimate SampleRTT deviation from EstimatedRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

TCP round trip time timeout

(typically = 025)

TimeoutInterval = EstimatedRTT + 4DevRTT

estimated RTT ldquosafety marginrdquo

Transport Layer 3-32

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-33

TCP reliable data transfer TCP creates rdt

service on top of IPrsquos unreliable service pipelined segments cumulative acks

bull selective acks often supported as an option

single retransmission timer

retransmissions triggered by timeout events duplicate acks

letrsquos initially consider simplified TCP sender ignore duplicate acks ignore flow control

congestion control

Transport Layer 3-34

TCP sender eventsdata rcvd from app create segment

with seq (= byte-stream number of first data byte in segment)

start timer if not already running (for oldest unacked segment) TimeOutInterval =

smoothed_RTT + 4deviation_RTT

timeout retransmit segment

that caused timeout restart timer ack rcvd if ack acknowledges

previously unacked segments update what is

known to be ACKed (re-)start timer if still

unacked segments

Transport Layer 3-35

TCP sender (simplified)

waitfor

event

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

create segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running) start timer

data received from application above

retransmit not-yet-acked segment with smallest seq start timer

timeout

if (y gt SendBase) SendBase = y SendBasendash1 last cumulatively ACKed byte if (there are currently not-yet-acked segments) (re-)start timer else stop timer

ACK received with ACK field value y

Transport Layer 3-36

TCP retransmission scenarios

lost ACK scenario

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8 bytes of data

Xtim

eo

ut

ACK=100

premature timeout

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8bytes of data

tim

eo

ut

ACK=120

Seq=100 20 bytes of data

ACK=120

SendBase=100

SendBase=120

SendBase=120

SendBase=92

Transport Layer 3-37

TCP retransmission scenarios

X

cumulative ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=120 15 bytes of data

tim

eo

ut

Seq=100 20 bytes of data

ACK=120

Transport Layer 3-38

TCP ACK generation [RFC 1122 RFC

2581]

event at receiver

arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

arrival of in-order segment withexpected seq One other segment has ACK pending

arrival of out-of-order segmenthigher-than-expect seq Gap detected

arrival of segment that partially or completely fills gap

TCP receiver action

delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

immediately send single cumulative ACK ACKing both in-order segments

immediately send duplicate ACK indicating seq of next expected byte

immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-39

TCP fast retransmit time-out period

often relatively long long delay before

resending lost packet detect lost

segments via duplicate ACKs sender often sends

many segments back-to-back

if segment is lost there will likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unacked segment with smallest seq

likely that unacked segment lost so donrsquot wait for timeout

TCP fast retransmit

(ldquotriple duplicate ACKsrdquo)

Transport Layer 3-40

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

tim

eo

ut ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer 3-41

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-42

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

application may remove data from

TCP socket buffers hellip

hellip slower than TCP

receiver is delivering(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-43

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size can be

set via socket options most operating systems

auto-adjust RcvBuffer sender limits amount

of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value to ensure receive buffer will not overflow

receiver-side buffering

Transport Layer 3-44

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-45

Connection Managementbefore exchanging data senderreceiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-46

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-47

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-48

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-49

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-50

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-51

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

TCP Overall state machine

Transport Layer 3-52

Transport Layer 3-53

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-54

congestion informally ldquotoo many sources sending

too much data too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Principles of congestion control

Transport Layer 3-55

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

output link capacity R

no retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

Transport Layer 3-56

one router finite buffers sender retransmission of timed-out packet

app-layer input = app-layer outputin = out

transport-layer input includes retransmissions rsquoin

ge in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

Causescosts of congestion scenario 2

Transport Layer 3-57

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-58

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-59

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-60

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-61

R2

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-62

four senders multihop paths timeoutretransmit

Q what happens as in and rsquo

in increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host CHost D

in original data

in original data plus

retransmitted data

A as red rsquoin increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-63

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

bandwidth used for that packet wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 28: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-28

TCP seq numbers ACKs

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

simple telnet scenario

Host BHost A

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Transport Layer 3-29

TCP round trip time timeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short

premature timeout unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT

SampleRTT measured time from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several

recent measurements not just current SampleRTT

Transport Layer 3-30

RTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

EstimatedRTT = (1- )EstimatedRTT + SampleRTT exponential weighted moving average influence of past sample decreases

exponentially fast typical value = 0125

TCP round trip time timeout

RTT

(mill

iseco

nds)

RTT gaiacsumassedu to fantasiaeurecomfr

sampleRTT

EstimatedRTT

time (seconds)

Transport Layer 3-31

timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin

estimate SampleRTT deviation from EstimatedRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

TCP round trip time timeout

(typically = 025)

TimeoutInterval = EstimatedRTT + 4DevRTT

estimated RTT ldquosafety marginrdquo

Transport Layer 3-32

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-33

TCP reliable data transfer TCP creates rdt

service on top of IPrsquos unreliable service pipelined segments cumulative acks

bull selective acks often supported as an option

single retransmission timer

retransmissions triggered by timeout events duplicate acks

letrsquos initially consider simplified TCP sender ignore duplicate acks ignore flow control

congestion control

Transport Layer 3-34

TCP sender eventsdata rcvd from app create segment

with seq (= byte-stream number of first data byte in segment)

start timer if not already running (for oldest unacked segment) TimeOutInterval =

smoothed_RTT + 4deviation_RTT

timeout retransmit segment

that caused timeout restart timer ack rcvd if ack acknowledges

previously unacked segments update what is

known to be ACKed (re-)start timer if still

unacked segments

Transport Layer 3-35

TCP sender (simplified)

waitfor

event

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

create segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running) start timer

data received from application above

retransmit not-yet-acked segment with smallest seq start timer

timeout

if (y gt SendBase) SendBase = y SendBasendash1 last cumulatively ACKed byte if (there are currently not-yet-acked segments) (re-)start timer else stop timer

ACK received with ACK field value y

Transport Layer 3-36

TCP retransmission scenarios

lost ACK scenario

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8 bytes of data

Xtim

eo

ut

ACK=100

premature timeout

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8bytes of data

tim

eo

ut

ACK=120

Seq=100 20 bytes of data

ACK=120

SendBase=100

SendBase=120

SendBase=120

SendBase=92

Transport Layer 3-37

TCP retransmission scenarios

X

cumulative ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=120 15 bytes of data

tim

eo

ut

Seq=100 20 bytes of data

ACK=120

Transport Layer 3-38

TCP ACK generation [RFC 1122 RFC

2581]

event at receiver

arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

arrival of in-order segment withexpected seq One other segment has ACK pending

arrival of out-of-order segmenthigher-than-expect seq Gap detected

arrival of segment that partially or completely fills gap

TCP receiver action

delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

immediately send single cumulative ACK ACKing both in-order segments

immediately send duplicate ACK indicating seq of next expected byte

immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-39

TCP fast retransmit time-out period

often relatively long long delay before

resending lost packet detect lost

segments via duplicate ACKs sender often sends

many segments back-to-back

if segment is lost there will likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unacked segment with smallest seq

likely that unacked segment lost so donrsquot wait for timeout

TCP fast retransmit

(ldquotriple duplicate ACKsrdquo)

Transport Layer 3-40

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

tim

eo

ut ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer 3-41

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-42

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

application may remove data from

TCP socket buffers hellip

hellip slower than TCP

receiver is delivering(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-43

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size can be

set via socket options most operating systems

auto-adjust RcvBuffer sender limits amount

of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value to ensure receive buffer will not overflow

receiver-side buffering

Transport Layer 3-44

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-45

Connection Managementbefore exchanging data senderreceiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-46

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-47

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-48

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-49

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-50

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-51

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

TCP Overall state machine

Transport Layer 3-52

Transport Layer 3-53

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-54

congestion informally ldquotoo many sources sending

too much data too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Principles of congestion control

Transport Layer 3-55

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

output link capacity R

no retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

Transport Layer 3-56

one router finite buffers sender retransmission of timed-out packet

app-layer input = app-layer outputin = out

transport-layer input includes retransmissions rsquoin

ge in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

Causescosts of congestion scenario 2

Transport Layer 3-57

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-58

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-59

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-60

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-61

R2

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-62

four senders multihop paths timeoutretransmit

Q what happens as in and rsquo

in increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host CHost D

in original data

in original data plus

retransmitted data

A as red rsquoin increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-63

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

bandwidth used for that packet wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 29: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-29

TCP round trip time timeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short

premature timeout unnecessary retransmissions

too long slow reaction to segment loss

Q how to estimate RTT

SampleRTT measured time from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several

recent measurements not just current SampleRTT

Transport Layer 3-30

RTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

EstimatedRTT = (1- )EstimatedRTT + SampleRTT exponential weighted moving average influence of past sample decreases

exponentially fast typical value = 0125

TCP round trip time timeout

RTT

(mill

iseco

nds)

RTT gaiacsumassedu to fantasiaeurecomfr

sampleRTT

EstimatedRTT

time (seconds)

Transport Layer 3-31

timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin

estimate SampleRTT deviation from EstimatedRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

TCP round trip time timeout

(typically = 025)

TimeoutInterval = EstimatedRTT + 4DevRTT

estimated RTT ldquosafety marginrdquo

Transport Layer 3-32

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-33

TCP reliable data transfer TCP creates rdt

service on top of IPrsquos unreliable service pipelined segments cumulative acks

bull selective acks often supported as an option

single retransmission timer

retransmissions triggered by timeout events duplicate acks

letrsquos initially consider simplified TCP sender ignore duplicate acks ignore flow control

congestion control

Transport Layer 3-34

TCP sender eventsdata rcvd from app create segment

with seq (= byte-stream number of first data byte in segment)

start timer if not already running (for oldest unacked segment) TimeOutInterval =

smoothed_RTT + 4deviation_RTT

timeout retransmit segment

that caused timeout restart timer ack rcvd if ack acknowledges

previously unacked segments update what is

known to be ACKed (re-)start timer if still

unacked segments

Transport Layer 3-35

TCP sender (simplified)

waitfor

event

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

create segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running) start timer

data received from application above

retransmit not-yet-acked segment with smallest seq start timer

timeout

if (y gt SendBase) SendBase = y SendBasendash1 last cumulatively ACKed byte if (there are currently not-yet-acked segments) (re-)start timer else stop timer

ACK received with ACK field value y

Transport Layer 3-36

TCP retransmission scenarios

lost ACK scenario

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8 bytes of data

Xtim

eo

ut

ACK=100

premature timeout

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8bytes of data

tim

eo

ut

ACK=120

Seq=100 20 bytes of data

ACK=120

SendBase=100

SendBase=120

SendBase=120

SendBase=92

Transport Layer 3-37

TCP retransmission scenarios

X

cumulative ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=120 15 bytes of data

tim

eo

ut

Seq=100 20 bytes of data

ACK=120

Transport Layer 3-38

TCP ACK generation [RFC 1122 RFC

2581]

event at receiver

arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

arrival of in-order segment withexpected seq One other segment has ACK pending

arrival of out-of-order segmenthigher-than-expect seq Gap detected

arrival of segment that partially or completely fills gap

TCP receiver action

delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

immediately send single cumulative ACK ACKing both in-order segments

immediately send duplicate ACK indicating seq of next expected byte

immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-39

TCP fast retransmit time-out period

often relatively long long delay before

resending lost packet detect lost

segments via duplicate ACKs sender often sends

many segments back-to-back

if segment is lost there will likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unacked segment with smallest seq

likely that unacked segment lost so donrsquot wait for timeout

TCP fast retransmit

(ldquotriple duplicate ACKsrdquo)

Transport Layer 3-40

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

tim

eo

ut ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer 3-41

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-42

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

application may remove data from

TCP socket buffers hellip

hellip slower than TCP

receiver is delivering(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-43

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size can be

set via socket options most operating systems

auto-adjust RcvBuffer sender limits amount

of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value to ensure receive buffer will not overflow

receiver-side buffering

Transport Layer 3-44

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-45

Connection Managementbefore exchanging data senderreceiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-46

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-47

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-48

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-49

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-50

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-51

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

TCP Overall state machine

Transport Layer 3-52

Transport Layer 3-53

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-54

congestion informally ldquotoo many sources sending

too much data too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Principles of congestion control

Transport Layer 3-55

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

output link capacity R

no retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

Transport Layer 3-56

one router finite buffers sender retransmission of timed-out packet

app-layer input = app-layer outputin = out

transport-layer input includes retransmissions rsquoin

ge in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

Causescosts of congestion scenario 2

Transport Layer 3-57

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-58

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-59

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-60

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-61

R2

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-62

four senders multihop paths timeoutretransmit

Q what happens as in and rsquo

in increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host CHost D

in original data

in original data plus

retransmitted data

A as red rsquoin increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-63

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

bandwidth used for that packet wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 30: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-30

RTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

EstimatedRTT = (1- )EstimatedRTT + SampleRTT exponential weighted moving average influence of past sample decreases

exponentially fast typical value = 0125

TCP round trip time timeout

RTT

(mill

iseco

nds)

RTT gaiacsumassedu to fantasiaeurecomfr

sampleRTT

EstimatedRTT

time (seconds)

Transport Layer 3-31

timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin

estimate SampleRTT deviation from EstimatedRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

TCP round trip time timeout

(typically = 025)

TimeoutInterval = EstimatedRTT + 4DevRTT

estimated RTT ldquosafety marginrdquo

Transport Layer 3-32

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-33

TCP reliable data transfer TCP creates rdt

service on top of IPrsquos unreliable service pipelined segments cumulative acks

bull selective acks often supported as an option

single retransmission timer

retransmissions triggered by timeout events duplicate acks

letrsquos initially consider simplified TCP sender ignore duplicate acks ignore flow control

congestion control

Transport Layer 3-34

TCP sender eventsdata rcvd from app create segment

with seq (= byte-stream number of first data byte in segment)

start timer if not already running (for oldest unacked segment) TimeOutInterval =

smoothed_RTT + 4deviation_RTT

timeout retransmit segment

that caused timeout restart timer ack rcvd if ack acknowledges

previously unacked segments update what is

known to be ACKed (re-)start timer if still

unacked segments

Transport Layer 3-35

TCP sender (simplified)

waitfor

event

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

create segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running) start timer

data received from application above

retransmit not-yet-acked segment with smallest seq start timer

timeout

if (y gt SendBase) SendBase = y SendBasendash1 last cumulatively ACKed byte if (there are currently not-yet-acked segments) (re-)start timer else stop timer

ACK received with ACK field value y

Transport Layer 3-36

TCP retransmission scenarios

lost ACK scenario

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8 bytes of data

Xtim

eo

ut

ACK=100

premature timeout

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8bytes of data

tim

eo

ut

ACK=120

Seq=100 20 bytes of data

ACK=120

SendBase=100

SendBase=120

SendBase=120

SendBase=92

Transport Layer 3-37

TCP retransmission scenarios

X

cumulative ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=120 15 bytes of data

tim

eo

ut

Seq=100 20 bytes of data

ACK=120

Transport Layer 3-38

TCP ACK generation [RFC 1122 RFC

2581]

event at receiver

arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

arrival of in-order segment withexpected seq One other segment has ACK pending

arrival of out-of-order segmenthigher-than-expect seq Gap detected

arrival of segment that partially or completely fills gap

TCP receiver action

delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

immediately send single cumulative ACK ACKing both in-order segments

immediately send duplicate ACK indicating seq of next expected byte

immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-39

TCP fast retransmit time-out period

often relatively long long delay before

resending lost packet detect lost

segments via duplicate ACKs sender often sends

many segments back-to-back

if segment is lost there will likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unacked segment with smallest seq

likely that unacked segment lost so donrsquot wait for timeout

TCP fast retransmit

(ldquotriple duplicate ACKsrdquo)

Transport Layer 3-40

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

tim

eo

ut ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer 3-41

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-42

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

application may remove data from

TCP socket buffers hellip

hellip slower than TCP

receiver is delivering(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-43

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size can be

set via socket options most operating systems

auto-adjust RcvBuffer sender limits amount

of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value to ensure receive buffer will not overflow

receiver-side buffering

Transport Layer 3-44

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-45

Connection Managementbefore exchanging data senderreceiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-46

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-47

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-48

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-49

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-50

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-51

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

TCP Overall state machine

Transport Layer 3-52

Transport Layer 3-53

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-54

congestion informally ldquotoo many sources sending

too much data too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Principles of congestion control

Transport Layer 3-55

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

output link capacity R

no retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

Transport Layer 3-56

one router finite buffers sender retransmission of timed-out packet

app-layer input = app-layer outputin = out

transport-layer input includes retransmissions rsquoin

ge in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

Causescosts of congestion scenario 2

Transport Layer 3-57

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-58

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-59

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-60

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-61

R2

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-62

four senders multihop paths timeoutretransmit

Q what happens as in and rsquo

in increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host CHost D

in original data

in original data plus

retransmitted data

A as red rsquoin increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-63

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

bandwidth used for that packet wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 31: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-31

timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin

estimate SampleRTT deviation from EstimatedRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

TCP round trip time timeout

(typically = 025)

TimeoutInterval = EstimatedRTT + 4DevRTT

estimated RTT ldquosafety marginrdquo

Transport Layer 3-32

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-33

TCP reliable data transfer TCP creates rdt

service on top of IPrsquos unreliable service pipelined segments cumulative acks

bull selective acks often supported as an option

single retransmission timer

retransmissions triggered by timeout events duplicate acks

letrsquos initially consider simplified TCP sender ignore duplicate acks ignore flow control

congestion control

Transport Layer 3-34

TCP sender eventsdata rcvd from app create segment

with seq (= byte-stream number of first data byte in segment)

start timer if not already running (for oldest unacked segment) TimeOutInterval =

smoothed_RTT + 4deviation_RTT

timeout retransmit segment

that caused timeout restart timer ack rcvd if ack acknowledges

previously unacked segments update what is

known to be ACKed (re-)start timer if still

unacked segments

Transport Layer 3-35

TCP sender (simplified)

waitfor

event

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

create segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running) start timer

data received from application above

retransmit not-yet-acked segment with smallest seq start timer

timeout

if (y gt SendBase) SendBase = y SendBasendash1 last cumulatively ACKed byte if (there are currently not-yet-acked segments) (re-)start timer else stop timer

ACK received with ACK field value y

Transport Layer 3-36

TCP retransmission scenarios

lost ACK scenario

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8 bytes of data

Xtim

eo

ut

ACK=100

premature timeout

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8bytes of data

tim

eo

ut

ACK=120

Seq=100 20 bytes of data

ACK=120

SendBase=100

SendBase=120

SendBase=120

SendBase=92

Transport Layer 3-37

TCP retransmission scenarios

X

cumulative ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=120 15 bytes of data

tim

eo

ut

Seq=100 20 bytes of data

ACK=120

Transport Layer 3-38

TCP ACK generation [RFC 1122 RFC

2581]

event at receiver

arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

arrival of in-order segment withexpected seq One other segment has ACK pending

arrival of out-of-order segmenthigher-than-expect seq Gap detected

arrival of segment that partially or completely fills gap

TCP receiver action

delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

immediately send single cumulative ACK ACKing both in-order segments

immediately send duplicate ACK indicating seq of next expected byte

immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-39

TCP fast retransmit time-out period

often relatively long long delay before

resending lost packet detect lost

segments via duplicate ACKs sender often sends

many segments back-to-back

if segment is lost there will likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unacked segment with smallest seq

likely that unacked segment lost so donrsquot wait for timeout

TCP fast retransmit

(ldquotriple duplicate ACKsrdquo)

Transport Layer 3-40

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

tim

eo

ut ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer 3-41

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-42

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

application may remove data from

TCP socket buffers hellip

hellip slower than TCP

receiver is delivering(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-43

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size can be

set via socket options most operating systems

auto-adjust RcvBuffer sender limits amount

of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value to ensure receive buffer will not overflow

receiver-side buffering

Transport Layer 3-44

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-45

Connection Managementbefore exchanging data senderreceiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-46

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-47

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-48

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-49

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-50

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-51

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

TCP Overall state machine

Transport Layer 3-52

Transport Layer 3-53

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-54

congestion informally ldquotoo many sources sending

too much data too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Principles of congestion control

Transport Layer 3-55

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

output link capacity R

no retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

Transport Layer 3-56

one router finite buffers sender retransmission of timed-out packet

app-layer input = app-layer outputin = out

transport-layer input includes retransmissions rsquoin

ge in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

Causescosts of congestion scenario 2

Transport Layer 3-57

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-58

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-59

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-60

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-61

R2

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-62

four senders multihop paths timeoutretransmit

Q what happens as in and rsquo

in increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host CHost D

in original data

in original data plus

retransmitted data

A as red rsquoin increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-63

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

bandwidth used for that packet wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 32: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-32

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-33

TCP reliable data transfer TCP creates rdt

service on top of IPrsquos unreliable service pipelined segments cumulative acks

bull selective acks often supported as an option

single retransmission timer

retransmissions triggered by timeout events duplicate acks

letrsquos initially consider simplified TCP sender ignore duplicate acks ignore flow control

congestion control

Transport Layer 3-34

TCP sender eventsdata rcvd from app create segment

with seq (= byte-stream number of first data byte in segment)

start timer if not already running (for oldest unacked segment) TimeOutInterval =

smoothed_RTT + 4deviation_RTT

timeout retransmit segment

that caused timeout restart timer ack rcvd if ack acknowledges

previously unacked segments update what is

known to be ACKed (re-)start timer if still

unacked segments

Transport Layer 3-35

TCP sender (simplified)

waitfor

event

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

create segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running) start timer

data received from application above

retransmit not-yet-acked segment with smallest seq start timer

timeout

if (y gt SendBase) SendBase = y SendBasendash1 last cumulatively ACKed byte if (there are currently not-yet-acked segments) (re-)start timer else stop timer

ACK received with ACK field value y

Transport Layer 3-36

TCP retransmission scenarios

lost ACK scenario

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8 bytes of data

Xtim

eo

ut

ACK=100

premature timeout

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8bytes of data

tim

eo

ut

ACK=120

Seq=100 20 bytes of data

ACK=120

SendBase=100

SendBase=120

SendBase=120

SendBase=92

Transport Layer 3-37

TCP retransmission scenarios

X

cumulative ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=120 15 bytes of data

tim

eo

ut

Seq=100 20 bytes of data

ACK=120

Transport Layer 3-38

TCP ACK generation [RFC 1122 RFC

2581]

event at receiver

arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

arrival of in-order segment withexpected seq One other segment has ACK pending

arrival of out-of-order segmenthigher-than-expect seq Gap detected

arrival of segment that partially or completely fills gap

TCP receiver action

delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

immediately send single cumulative ACK ACKing both in-order segments

immediately send duplicate ACK indicating seq of next expected byte

immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-39

TCP fast retransmit time-out period

often relatively long long delay before

resending lost packet detect lost

segments via duplicate ACKs sender often sends

many segments back-to-back

if segment is lost there will likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unacked segment with smallest seq

likely that unacked segment lost so donrsquot wait for timeout

TCP fast retransmit

(ldquotriple duplicate ACKsrdquo)

Transport Layer 3-40

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

tim

eo

ut ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer 3-41

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-42

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

application may remove data from

TCP socket buffers hellip

hellip slower than TCP

receiver is delivering(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-43

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size can be

set via socket options most operating systems

auto-adjust RcvBuffer sender limits amount

of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value to ensure receive buffer will not overflow

receiver-side buffering

Transport Layer 3-44

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-45

Connection Managementbefore exchanging data senderreceiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-46

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-47

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-48

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-49

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-50

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-51

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

TCP Overall state machine

Transport Layer 3-52

Transport Layer 3-53

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-54

congestion informally ldquotoo many sources sending

too much data too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Principles of congestion control

Transport Layer 3-55

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

output link capacity R

no retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

Transport Layer 3-56

one router finite buffers sender retransmission of timed-out packet

app-layer input = app-layer outputin = out

transport-layer input includes retransmissions rsquoin

ge in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

Causescosts of congestion scenario 2

Transport Layer 3-57

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-58

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-59

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-60

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-61

R2

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-62

four senders multihop paths timeoutretransmit

Q what happens as in and rsquo

in increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host CHost D

in original data

in original data plus

retransmitted data

A as red rsquoin increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-63

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

bandwidth used for that packet wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 33: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-33

TCP reliable data transfer TCP creates rdt

service on top of IPrsquos unreliable service pipelined segments cumulative acks

bull selective acks often supported as an option

single retransmission timer

retransmissions triggered by timeout events duplicate acks

letrsquos initially consider simplified TCP sender ignore duplicate acks ignore flow control

congestion control

Transport Layer 3-34

TCP sender eventsdata rcvd from app create segment

with seq (= byte-stream number of first data byte in segment)

start timer if not already running (for oldest unacked segment) TimeOutInterval =

smoothed_RTT + 4deviation_RTT

timeout retransmit segment

that caused timeout restart timer ack rcvd if ack acknowledges

previously unacked segments update what is

known to be ACKed (re-)start timer if still

unacked segments

Transport Layer 3-35

TCP sender (simplified)

waitfor

event

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

create segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running) start timer

data received from application above

retransmit not-yet-acked segment with smallest seq start timer

timeout

if (y gt SendBase) SendBase = y SendBasendash1 last cumulatively ACKed byte if (there are currently not-yet-acked segments) (re-)start timer else stop timer

ACK received with ACK field value y

Transport Layer 3-36

TCP retransmission scenarios

lost ACK scenario

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8 bytes of data

Xtim

eo

ut

ACK=100

premature timeout

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8bytes of data

tim

eo

ut

ACK=120

Seq=100 20 bytes of data

ACK=120

SendBase=100

SendBase=120

SendBase=120

SendBase=92

Transport Layer 3-37

TCP retransmission scenarios

X

cumulative ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=120 15 bytes of data

tim

eo

ut

Seq=100 20 bytes of data

ACK=120

Transport Layer 3-38

TCP ACK generation [RFC 1122 RFC

2581]

event at receiver

arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

arrival of in-order segment withexpected seq One other segment has ACK pending

arrival of out-of-order segmenthigher-than-expect seq Gap detected

arrival of segment that partially or completely fills gap

TCP receiver action

delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

immediately send single cumulative ACK ACKing both in-order segments

immediately send duplicate ACK indicating seq of next expected byte

immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-39

TCP fast retransmit time-out period

often relatively long long delay before

resending lost packet detect lost

segments via duplicate ACKs sender often sends

many segments back-to-back

if segment is lost there will likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unacked segment with smallest seq

likely that unacked segment lost so donrsquot wait for timeout

TCP fast retransmit

(ldquotriple duplicate ACKsrdquo)

Transport Layer 3-40

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

tim

eo

ut ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer 3-41

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-42

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

application may remove data from

TCP socket buffers hellip

hellip slower than TCP

receiver is delivering(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-43

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size can be

set via socket options most operating systems

auto-adjust RcvBuffer sender limits amount

of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value to ensure receive buffer will not overflow

receiver-side buffering

Transport Layer 3-44

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-45

Connection Managementbefore exchanging data senderreceiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-46

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-47

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-48

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-49

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-50

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-51

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

TCP Overall state machine

Transport Layer 3-52

Transport Layer 3-53

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-54

congestion informally ldquotoo many sources sending

too much data too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Principles of congestion control

Transport Layer 3-55

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

output link capacity R

no retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

Transport Layer 3-56

one router finite buffers sender retransmission of timed-out packet

app-layer input = app-layer outputin = out

transport-layer input includes retransmissions rsquoin

ge in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

Causescosts of congestion scenario 2

Transport Layer 3-57

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-58

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-59

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-60

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-61

R2

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-62

four senders multihop paths timeoutretransmit

Q what happens as in and rsquo

in increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host CHost D

in original data

in original data plus

retransmitted data

A as red rsquoin increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-63

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

bandwidth used for that packet wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 34: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-34

TCP sender eventsdata rcvd from app create segment

with seq (= byte-stream number of first data byte in segment)

start timer if not already running (for oldest unacked segment) TimeOutInterval =

smoothed_RTT + 4deviation_RTT

timeout retransmit segment

that caused timeout restart timer ack rcvd if ack acknowledges

previously unacked segments update what is

known to be ACKed (re-)start timer if still

unacked segments

Transport Layer 3-35

TCP sender (simplified)

waitfor

event

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

create segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running) start timer

data received from application above

retransmit not-yet-acked segment with smallest seq start timer

timeout

if (y gt SendBase) SendBase = y SendBasendash1 last cumulatively ACKed byte if (there are currently not-yet-acked segments) (re-)start timer else stop timer

ACK received with ACK field value y

Transport Layer 3-36

TCP retransmission scenarios

lost ACK scenario

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8 bytes of data

Xtim

eo

ut

ACK=100

premature timeout

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8bytes of data

tim

eo

ut

ACK=120

Seq=100 20 bytes of data

ACK=120

SendBase=100

SendBase=120

SendBase=120

SendBase=92

Transport Layer 3-37

TCP retransmission scenarios

X

cumulative ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=120 15 bytes of data

tim

eo

ut

Seq=100 20 bytes of data

ACK=120

Transport Layer 3-38

TCP ACK generation [RFC 1122 RFC

2581]

event at receiver

arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

arrival of in-order segment withexpected seq One other segment has ACK pending

arrival of out-of-order segmenthigher-than-expect seq Gap detected

arrival of segment that partially or completely fills gap

TCP receiver action

delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

immediately send single cumulative ACK ACKing both in-order segments

immediately send duplicate ACK indicating seq of next expected byte

immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-39

TCP fast retransmit time-out period

often relatively long long delay before

resending lost packet detect lost

segments via duplicate ACKs sender often sends

many segments back-to-back

if segment is lost there will likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unacked segment with smallest seq

likely that unacked segment lost so donrsquot wait for timeout

TCP fast retransmit

(ldquotriple duplicate ACKsrdquo)

Transport Layer 3-40

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

tim

eo

ut ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer 3-41

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-42

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

application may remove data from

TCP socket buffers hellip

hellip slower than TCP

receiver is delivering(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-43

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size can be

set via socket options most operating systems

auto-adjust RcvBuffer sender limits amount

of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value to ensure receive buffer will not overflow

receiver-side buffering

Transport Layer 3-44

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-45

Connection Managementbefore exchanging data senderreceiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-46

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-47

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-48

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-49

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-50

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-51

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

TCP Overall state machine

Transport Layer 3-52

Transport Layer 3-53

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-54

congestion informally ldquotoo many sources sending

too much data too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Principles of congestion control

Transport Layer 3-55

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

output link capacity R

no retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

Transport Layer 3-56

one router finite buffers sender retransmission of timed-out packet

app-layer input = app-layer outputin = out

transport-layer input includes retransmissions rsquoin

ge in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

Causescosts of congestion scenario 2

Transport Layer 3-57

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-58

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-59

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-60

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-61

R2

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-62

four senders multihop paths timeoutretransmit

Q what happens as in and rsquo

in increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host CHost D

in original data

in original data plus

retransmitted data

A as red rsquoin increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-63

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

bandwidth used for that packet wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 35: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-35

TCP sender (simplified)

waitfor

event

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

create segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running) start timer

data received from application above

retransmit not-yet-acked segment with smallest seq start timer

timeout

if (y gt SendBase) SendBase = y SendBasendash1 last cumulatively ACKed byte if (there are currently not-yet-acked segments) (re-)start timer else stop timer

ACK received with ACK field value y

Transport Layer 3-36

TCP retransmission scenarios

lost ACK scenario

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8 bytes of data

Xtim

eo

ut

ACK=100

premature timeout

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8bytes of data

tim

eo

ut

ACK=120

Seq=100 20 bytes of data

ACK=120

SendBase=100

SendBase=120

SendBase=120

SendBase=92

Transport Layer 3-37

TCP retransmission scenarios

X

cumulative ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=120 15 bytes of data

tim

eo

ut

Seq=100 20 bytes of data

ACK=120

Transport Layer 3-38

TCP ACK generation [RFC 1122 RFC

2581]

event at receiver

arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

arrival of in-order segment withexpected seq One other segment has ACK pending

arrival of out-of-order segmenthigher-than-expect seq Gap detected

arrival of segment that partially or completely fills gap

TCP receiver action

delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

immediately send single cumulative ACK ACKing both in-order segments

immediately send duplicate ACK indicating seq of next expected byte

immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-39

TCP fast retransmit time-out period

often relatively long long delay before

resending lost packet detect lost

segments via duplicate ACKs sender often sends

many segments back-to-back

if segment is lost there will likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unacked segment with smallest seq

likely that unacked segment lost so donrsquot wait for timeout

TCP fast retransmit

(ldquotriple duplicate ACKsrdquo)

Transport Layer 3-40

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

tim

eo

ut ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer 3-41

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-42

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

application may remove data from

TCP socket buffers hellip

hellip slower than TCP

receiver is delivering(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-43

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size can be

set via socket options most operating systems

auto-adjust RcvBuffer sender limits amount

of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value to ensure receive buffer will not overflow

receiver-side buffering

Transport Layer 3-44

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-45

Connection Managementbefore exchanging data senderreceiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-46

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-47

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-48

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-49

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-50

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-51

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

TCP Overall state machine

Transport Layer 3-52

Transport Layer 3-53

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-54

congestion informally ldquotoo many sources sending

too much data too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Principles of congestion control

Transport Layer 3-55

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

output link capacity R

no retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

Transport Layer 3-56

one router finite buffers sender retransmission of timed-out packet

app-layer input = app-layer outputin = out

transport-layer input includes retransmissions rsquoin

ge in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

Causescosts of congestion scenario 2

Transport Layer 3-57

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-58

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-59

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-60

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-61

R2

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-62

four senders multihop paths timeoutretransmit

Q what happens as in and rsquo

in increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host CHost D

in original data

in original data plus

retransmitted data

A as red rsquoin increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-63

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

bandwidth used for that packet wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 36: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-36

TCP retransmission scenarios

lost ACK scenario

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8 bytes of data

Xtim

eo

ut

ACK=100

premature timeout

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=92 8bytes of data

tim

eo

ut

ACK=120

Seq=100 20 bytes of data

ACK=120

SendBase=100

SendBase=120

SendBase=120

SendBase=92

Transport Layer 3-37

TCP retransmission scenarios

X

cumulative ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=120 15 bytes of data

tim

eo

ut

Seq=100 20 bytes of data

ACK=120

Transport Layer 3-38

TCP ACK generation [RFC 1122 RFC

2581]

event at receiver

arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

arrival of in-order segment withexpected seq One other segment has ACK pending

arrival of out-of-order segmenthigher-than-expect seq Gap detected

arrival of segment that partially or completely fills gap

TCP receiver action

delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

immediately send single cumulative ACK ACKing both in-order segments

immediately send duplicate ACK indicating seq of next expected byte

immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-39

TCP fast retransmit time-out period

often relatively long long delay before

resending lost packet detect lost

segments via duplicate ACKs sender often sends

many segments back-to-back

if segment is lost there will likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unacked segment with smallest seq

likely that unacked segment lost so donrsquot wait for timeout

TCP fast retransmit

(ldquotriple duplicate ACKsrdquo)

Transport Layer 3-40

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

tim

eo

ut ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer 3-41

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-42

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

application may remove data from

TCP socket buffers hellip

hellip slower than TCP

receiver is delivering(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-43

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size can be

set via socket options most operating systems

auto-adjust RcvBuffer sender limits amount

of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value to ensure receive buffer will not overflow

receiver-side buffering

Transport Layer 3-44

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-45

Connection Managementbefore exchanging data senderreceiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-46

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-47

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-48

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-49

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-50

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-51

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

TCP Overall state machine

Transport Layer 3-52

Transport Layer 3-53

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-54

congestion informally ldquotoo many sources sending

too much data too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Principles of congestion control

Transport Layer 3-55

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

output link capacity R

no retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

Transport Layer 3-56

one router finite buffers sender retransmission of timed-out packet

app-layer input = app-layer outputin = out

transport-layer input includes retransmissions rsquoin

ge in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

Causescosts of congestion scenario 2

Transport Layer 3-57

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-58

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-59

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-60

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-61

R2

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-62

four senders multihop paths timeoutretransmit

Q what happens as in and rsquo

in increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host CHost D

in original data

in original data plus

retransmitted data

A as red rsquoin increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-63

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

bandwidth used for that packet wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 37: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-37

TCP retransmission scenarios

X

cumulative ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

Seq=120 15 bytes of data

tim

eo

ut

Seq=100 20 bytes of data

ACK=120

Transport Layer 3-38

TCP ACK generation [RFC 1122 RFC

2581]

event at receiver

arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

arrival of in-order segment withexpected seq One other segment has ACK pending

arrival of out-of-order segmenthigher-than-expect seq Gap detected

arrival of segment that partially or completely fills gap

TCP receiver action

delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

immediately send single cumulative ACK ACKing both in-order segments

immediately send duplicate ACK indicating seq of next expected byte

immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-39

TCP fast retransmit time-out period

often relatively long long delay before

resending lost packet detect lost

segments via duplicate ACKs sender often sends

many segments back-to-back

if segment is lost there will likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unacked segment with smallest seq

likely that unacked segment lost so donrsquot wait for timeout

TCP fast retransmit

(ldquotriple duplicate ACKsrdquo)

Transport Layer 3-40

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

tim

eo

ut ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer 3-41

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-42

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

application may remove data from

TCP socket buffers hellip

hellip slower than TCP

receiver is delivering(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-43

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size can be

set via socket options most operating systems

auto-adjust RcvBuffer sender limits amount

of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value to ensure receive buffer will not overflow

receiver-side buffering

Transport Layer 3-44

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-45

Connection Managementbefore exchanging data senderreceiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-46

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-47

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-48

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-49

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-50

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-51

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

TCP Overall state machine

Transport Layer 3-52

Transport Layer 3-53

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-54

congestion informally ldquotoo many sources sending

too much data too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Principles of congestion control

Transport Layer 3-55

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

output link capacity R

no retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

Transport Layer 3-56

one router finite buffers sender retransmission of timed-out packet

app-layer input = app-layer outputin = out

transport-layer input includes retransmissions rsquoin

ge in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

Causescosts of congestion scenario 2

Transport Layer 3-57

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-58

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-59

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-60

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-61

R2

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-62

four senders multihop paths timeoutretransmit

Q what happens as in and rsquo

in increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host CHost D

in original data

in original data plus

retransmitted data

A as red rsquoin increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-63

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

bandwidth used for that packet wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 38: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-38

TCP ACK generation [RFC 1122 RFC

2581]

event at receiver

arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

arrival of in-order segment withexpected seq One other segment has ACK pending

arrival of out-of-order segmenthigher-than-expect seq Gap detected

arrival of segment that partially or completely fills gap

TCP receiver action

delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

immediately send single cumulative ACK ACKing both in-order segments

immediately send duplicate ACK indicating seq of next expected byte

immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-39

TCP fast retransmit time-out period

often relatively long long delay before

resending lost packet detect lost

segments via duplicate ACKs sender often sends

many segments back-to-back

if segment is lost there will likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unacked segment with smallest seq

likely that unacked segment lost so donrsquot wait for timeout

TCP fast retransmit

(ldquotriple duplicate ACKsrdquo)

Transport Layer 3-40

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

tim

eo

ut ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer 3-41

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-42

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

application may remove data from

TCP socket buffers hellip

hellip slower than TCP

receiver is delivering(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-43

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size can be

set via socket options most operating systems

auto-adjust RcvBuffer sender limits amount

of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value to ensure receive buffer will not overflow

receiver-side buffering

Transport Layer 3-44

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-45

Connection Managementbefore exchanging data senderreceiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-46

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-47

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-48

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-49

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-50

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-51

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

TCP Overall state machine

Transport Layer 3-52

Transport Layer 3-53

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-54

congestion informally ldquotoo many sources sending

too much data too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Principles of congestion control

Transport Layer 3-55

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

output link capacity R

no retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

Transport Layer 3-56

one router finite buffers sender retransmission of timed-out packet

app-layer input = app-layer outputin = out

transport-layer input includes retransmissions rsquoin

ge in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

Causescosts of congestion scenario 2

Transport Layer 3-57

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-58

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-59

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-60

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-61

R2

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-62

four senders multihop paths timeoutretransmit

Q what happens as in and rsquo

in increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host CHost D

in original data

in original data plus

retransmitted data

A as red rsquoin increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-63

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

bandwidth used for that packet wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 39: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-39

TCP fast retransmit time-out period

often relatively long long delay before

resending lost packet detect lost

segments via duplicate ACKs sender often sends

many segments back-to-back

if segment is lost there will likely be many duplicate ACKs

if sender receives 3 ACKs for same data(ldquotriple duplicate ACKsrdquo) resend unacked segment with smallest seq

likely that unacked segment lost so donrsquot wait for timeout

TCP fast retransmit

(ldquotriple duplicate ACKsrdquo)

Transport Layer 3-40

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

tim

eo

ut ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer 3-41

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-42

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

application may remove data from

TCP socket buffers hellip

hellip slower than TCP

receiver is delivering(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-43

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size can be

set via socket options most operating systems

auto-adjust RcvBuffer sender limits amount

of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value to ensure receive buffer will not overflow

receiver-side buffering

Transport Layer 3-44

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-45

Connection Managementbefore exchanging data senderreceiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-46

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-47

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-48

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-49

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-50

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-51

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

TCP Overall state machine

Transport Layer 3-52

Transport Layer 3-53

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-54

congestion informally ldquotoo many sources sending

too much data too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Principles of congestion control

Transport Layer 3-55

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

output link capacity R

no retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

Transport Layer 3-56

one router finite buffers sender retransmission of timed-out packet

app-layer input = app-layer outputin = out

transport-layer input includes retransmissions rsquoin

ge in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

Causescosts of congestion scenario 2

Transport Layer 3-57

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-58

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-59

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-60

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-61

R2

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-62

four senders multihop paths timeoutretransmit

Q what happens as in and rsquo

in increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host CHost D

in original data

in original data plus

retransmitted data

A as red rsquoin increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-63

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

bandwidth used for that packet wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 40: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-40

X

fast retransmit after sender receipt of triple duplicate ACK

Host BHost A

Seq=92 8 bytes of data

ACK=100

tim

eo

ut ACK=100

ACK=100

ACK=100

TCP fast retransmit

Seq=100 20 bytes of data

Seq=100 20 bytes of data

Transport Layer 3-41

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-42

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

application may remove data from

TCP socket buffers hellip

hellip slower than TCP

receiver is delivering(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-43

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size can be

set via socket options most operating systems

auto-adjust RcvBuffer sender limits amount

of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value to ensure receive buffer will not overflow

receiver-side buffering

Transport Layer 3-44

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-45

Connection Managementbefore exchanging data senderreceiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-46

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-47

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-48

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-49

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-50

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-51

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

TCP Overall state machine

Transport Layer 3-52

Transport Layer 3-53

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-54

congestion informally ldquotoo many sources sending

too much data too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Principles of congestion control

Transport Layer 3-55

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

output link capacity R

no retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

Transport Layer 3-56

one router finite buffers sender retransmission of timed-out packet

app-layer input = app-layer outputin = out

transport-layer input includes retransmissions rsquoin

ge in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

Causescosts of congestion scenario 2

Transport Layer 3-57

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-58

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-59

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-60

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-61

R2

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-62

four senders multihop paths timeoutretransmit

Q what happens as in and rsquo

in increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host CHost D

in original data

in original data plus

retransmitted data

A as red rsquoin increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-63

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

bandwidth used for that packet wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 41: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-41

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-42

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

application may remove data from

TCP socket buffers hellip

hellip slower than TCP

receiver is delivering(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-43

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size can be

set via socket options most operating systems

auto-adjust RcvBuffer sender limits amount

of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value to ensure receive buffer will not overflow

receiver-side buffering

Transport Layer 3-44

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-45

Connection Managementbefore exchanging data senderreceiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-46

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-47

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-48

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-49

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-50

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-51

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

TCP Overall state machine

Transport Layer 3-52

Transport Layer 3-53

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-54

congestion informally ldquotoo many sources sending

too much data too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Principles of congestion control

Transport Layer 3-55

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

output link capacity R

no retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

Transport Layer 3-56

one router finite buffers sender retransmission of timed-out packet

app-layer input = app-layer outputin = out

transport-layer input includes retransmissions rsquoin

ge in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

Causescosts of congestion scenario 2

Transport Layer 3-57

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-58

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-59

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-60

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-61

R2

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-62

four senders multihop paths timeoutretransmit

Q what happens as in and rsquo

in increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host CHost D

in original data

in original data plus

retransmitted data

A as red rsquoin increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-63

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

bandwidth used for that packet wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 42: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-42

TCP flow controlapplication

process

TCP socketreceiver buffers

TCPcode

IPcode

application

OS

receiver protocol stack

application may remove data from

TCP socket buffers hellip

hellip slower than TCP

receiver is delivering(sender is sending)

from sender

receiver controls sender so sender wonrsquot overflow receiverrsquos buffer by transmitting too much too fast

flow control

Transport Layer 3-43

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size can be

set via socket options most operating systems

auto-adjust RcvBuffer sender limits amount

of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value to ensure receive buffer will not overflow

receiver-side buffering

Transport Layer 3-44

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-45

Connection Managementbefore exchanging data senderreceiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-46

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-47

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-48

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-49

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-50

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-51

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

TCP Overall state machine

Transport Layer 3-52

Transport Layer 3-53

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-54

congestion informally ldquotoo many sources sending

too much data too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Principles of congestion control

Transport Layer 3-55

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

output link capacity R

no retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

Transport Layer 3-56

one router finite buffers sender retransmission of timed-out packet

app-layer input = app-layer outputin = out

transport-layer input includes retransmissions rsquoin

ge in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

Causescosts of congestion scenario 2

Transport Layer 3-57

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-58

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-59

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-60

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-61

R2

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-62

four senders multihop paths timeoutretransmit

Q what happens as in and rsquo

in increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host CHost D

in original data

in original data plus

retransmitted data

A as red rsquoin increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-63

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

bandwidth used for that packet wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 43: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-43

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

receiver ldquoadvertisesrdquo free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size can be

set via socket options most operating systems

auto-adjust RcvBuffer sender limits amount

of unacked (ldquoin-flightrdquo) data to receiverrsquos rwnd value to ensure receive buffer will not overflow

receiver-side buffering

Transport Layer 3-44

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-45

Connection Managementbefore exchanging data senderreceiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-46

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-47

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-48

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-49

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-50

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-51

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

TCP Overall state machine

Transport Layer 3-52

Transport Layer 3-53

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-54

congestion informally ldquotoo many sources sending

too much data too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Principles of congestion control

Transport Layer 3-55

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

output link capacity R

no retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

Transport Layer 3-56

one router finite buffers sender retransmission of timed-out packet

app-layer input = app-layer outputin = out

transport-layer input includes retransmissions rsquoin

ge in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

Causescosts of congestion scenario 2

Transport Layer 3-57

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-58

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-59

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-60

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-61

R2

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-62

four senders multihop paths timeoutretransmit

Q what happens as in and rsquo

in increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host CHost D

in original data

in original data plus

retransmitted data

A as red rsquoin increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-63

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

bandwidth used for that packet wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 44: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-44

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-45

Connection Managementbefore exchanging data senderreceiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-46

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-47

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-48

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-49

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-50

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-51

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

TCP Overall state machine

Transport Layer 3-52

Transport Layer 3-53

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-54

congestion informally ldquotoo many sources sending

too much data too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Principles of congestion control

Transport Layer 3-55

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

output link capacity R

no retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

Transport Layer 3-56

one router finite buffers sender retransmission of timed-out packet

app-layer input = app-layer outputin = out

transport-layer input includes retransmissions rsquoin

ge in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

Causescosts of congestion scenario 2

Transport Layer 3-57

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-58

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-59

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-60

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-61

R2

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-62

four senders multihop paths timeoutretransmit

Q what happens as in and rsquo

in increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host CHost D

in original data

in original data plus

retransmitted data

A as red rsquoin increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-63

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

bandwidth used for that packet wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 45: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-45

Connection Managementbefore exchanging data senderreceiver

ldquohandshakerdquo agree to establish connection (each knowing the

other willing to establish connection) agree on connection parameters

connection state ESTABconnection variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

connection state ESTABconnection Variables

seq client-to-server server-to-clientrcvBuffer size at serverclient

application

network

Socket clientSocket = newSocket(hostnameport

number)

Socket connectionSocket = welcomeSocketaccept()

Transport Layer 3-46

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-47

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-48

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-49

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-50

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-51

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

TCP Overall state machine

Transport Layer 3-52

Transport Layer 3-53

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-54

congestion informally ldquotoo many sources sending

too much data too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Principles of congestion control

Transport Layer 3-55

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

output link capacity R

no retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

Transport Layer 3-56

one router finite buffers sender retransmission of timed-out packet

app-layer input = app-layer outputin = out

transport-layer input includes retransmissions rsquoin

ge in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

Causescosts of congestion scenario 2

Transport Layer 3-57

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-58

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-59

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-60

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-61

R2

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-62

four senders multihop paths timeoutretransmit

Q what happens as in and rsquo

in increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host CHost D

in original data

in original data plus

retransmitted data

A as red rsquoin increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-63

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

bandwidth used for that packet wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 46: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-46

Q will 2-way handshake always work in network

variable delays retransmitted messages

(eg req_conn(x)) due to message loss

message reordering canrsquot ldquoseerdquo other side

2-way handshake

Letrsquos talk

OKESTAB

ESTAB

choose xreq_conn(x)

ESTAB

ESTABacc_conn(x)

Agreeing to establish a connection

Transport Layer 3-47

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-48

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-49

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-50

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-51

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

TCP Overall state machine

Transport Layer 3-52

Transport Layer 3-53

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-54

congestion informally ldquotoo many sources sending

too much data too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Principles of congestion control

Transport Layer 3-55

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

output link capacity R

no retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

Transport Layer 3-56

one router finite buffers sender retransmission of timed-out packet

app-layer input = app-layer outputin = out

transport-layer input includes retransmissions rsquoin

ge in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

Causescosts of congestion scenario 2

Transport Layer 3-57

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-58

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-59

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-60

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-61

R2

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-62

four senders multihop paths timeoutretransmit

Q what happens as in and rsquo

in increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host CHost D

in original data

in original data plus

retransmitted data

A as red rsquoin increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-63

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

bandwidth used for that packet wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 47: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-47

Agreeing to establish a connection

2-way handshake failure scenarios

retransmitreq_conn(

x)

ESTAB

req_conn(x)

half open connection(no client)

client terminat

es

serverforgets x

connection x completes

retransmitreq_conn(

x)

ESTAB

req_conn(x)

data(x+1)

retransmitdata(x+1)

acceptdata(x+1)

choose xreq_conn(x)

ESTAB

ESTAB

acc_conn(x)

client terminat

es

ESTAB

choose xreq_conn(x)

ESTAB

acc_conn(x)

data(x+1) acceptdata(x+1)

connection x completes server

forgets x

Transport Layer 3-48

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-49

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-50

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-51

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

TCP Overall state machine

Transport Layer 3-52

Transport Layer 3-53

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-54

congestion informally ldquotoo many sources sending

too much data too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Principles of congestion control

Transport Layer 3-55

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

output link capacity R

no retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

Transport Layer 3-56

one router finite buffers sender retransmission of timed-out packet

app-layer input = app-layer outputin = out

transport-layer input includes retransmissions rsquoin

ge in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

Causescosts of congestion scenario 2

Transport Layer 3-57

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-58

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-59

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-60

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-61

R2

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-62

four senders multihop paths timeoutretransmit

Q what happens as in and rsquo

in increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host CHost D

in original data

in original data plus

retransmitted data

A as red rsquoin increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-63

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

bandwidth used for that packet wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 48: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-48

TCP 3-way handshake

SYNbit=1 Seq=x

choose init seq num xsend TCP SYN msg

ESTAB

SYNbit=1 Seq=yACKbit=1 ACKnum=x+1

choose init seq num ysend TCP SYNACKmsg acking SYN

ACKbit=1 ACKnum=y+1

received SYNACK(x) indicates server is livesend ACK for SYNACK

this segment may contain client-to-server data

received ACK(y) indicates client is live

SYNSENT

ESTAB

SYN RCVD

client state

LISTEN

server state

LISTEN

Transport Layer 3-49

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-50

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-51

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

TCP Overall state machine

Transport Layer 3-52

Transport Layer 3-53

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-54

congestion informally ldquotoo many sources sending

too much data too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Principles of congestion control

Transport Layer 3-55

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

output link capacity R

no retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

Transport Layer 3-56

one router finite buffers sender retransmission of timed-out packet

app-layer input = app-layer outputin = out

transport-layer input includes retransmissions rsquoin

ge in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

Causescosts of congestion scenario 2

Transport Layer 3-57

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-58

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-59

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-60

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-61

R2

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-62

four senders multihop paths timeoutretransmit

Q what happens as in and rsquo

in increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host CHost D

in original data

in original data plus

retransmitted data

A as red rsquoin increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-63

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

bandwidth used for that packet wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 49: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-49

TCP 3-way handshake FSM

closed

listen

SYNrcvd

SYNsent

ESTAB

Socket clientSocket = newSocket(hostnameport

number)

SYN(seq=x)

Socket connectionSocket = welcomeSocketaccept()

SYN(x)

SYNACK(seq=yACKnum=x+1)create new socket for

communication back to client

SYNACK(seq=yACKnum=x+1)

ACK(ACKnum=y+1)ACK(ACKnum=y+1)

Transport Layer 3-50

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-51

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

TCP Overall state machine

Transport Layer 3-52

Transport Layer 3-53

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-54

congestion informally ldquotoo many sources sending

too much data too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Principles of congestion control

Transport Layer 3-55

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

output link capacity R

no retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

Transport Layer 3-56

one router finite buffers sender retransmission of timed-out packet

app-layer input = app-layer outputin = out

transport-layer input includes retransmissions rsquoin

ge in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

Causescosts of congestion scenario 2

Transport Layer 3-57

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-58

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-59

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-60

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-61

R2

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-62

four senders multihop paths timeoutretransmit

Q what happens as in and rsquo

in increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host CHost D

in original data

in original data plus

retransmitted data

A as red rsquoin increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-63

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

bandwidth used for that packet wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 50: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-50

TCP closing a connection client server each close their side of

connection send TCP segment with FIN bit = 1

respond to received FIN with ACK on receiving FIN ACK can be combined with

own FIN simultaneous FIN exchanges can be

handled

Transport Layer 3-51

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

TCP Overall state machine

Transport Layer 3-52

Transport Layer 3-53

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-54

congestion informally ldquotoo many sources sending

too much data too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Principles of congestion control

Transport Layer 3-55

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

output link capacity R

no retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

Transport Layer 3-56

one router finite buffers sender retransmission of timed-out packet

app-layer input = app-layer outputin = out

transport-layer input includes retransmissions rsquoin

ge in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

Causescosts of congestion scenario 2

Transport Layer 3-57

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-58

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-59

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-60

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-61

R2

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-62

four senders multihop paths timeoutretransmit

Q what happens as in and rsquo

in increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host CHost D

in original data

in original data plus

retransmitted data

A as red rsquoin increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-63

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

bandwidth used for that packet wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 51: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-51

FIN_WAIT_2

CLOSE_WAIT

FINbit=1 seq=y

ACKbit=1 ACKnum=y+1

ACKbit=1 ACKnum=x+1 wait for server

close

can stillsend data

can no longersend data

LAST_ACK

CLOSED

TIMED_WAIT

timed wait for 2max

segment lifetime

CLOSED

TCP closing a connection

FIN_WAIT_1 FINbit=1 seq=xcan no longersend but can receive data

clientSocketclose()

client state server state

ESTABESTAB

TCP Overall state machine

Transport Layer 3-52

Transport Layer 3-53

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-54

congestion informally ldquotoo many sources sending

too much data too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Principles of congestion control

Transport Layer 3-55

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

output link capacity R

no retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

Transport Layer 3-56

one router finite buffers sender retransmission of timed-out packet

app-layer input = app-layer outputin = out

transport-layer input includes retransmissions rsquoin

ge in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

Causescosts of congestion scenario 2

Transport Layer 3-57

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-58

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-59

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-60

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-61

R2

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-62

four senders multihop paths timeoutretransmit

Q what happens as in and rsquo

in increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host CHost D

in original data

in original data plus

retransmitted data

A as red rsquoin increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-63

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

bandwidth used for that packet wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 52: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

TCP Overall state machine

Transport Layer 3-52

Transport Layer 3-53

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-54

congestion informally ldquotoo many sources sending

too much data too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Principles of congestion control

Transport Layer 3-55

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

output link capacity R

no retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

Transport Layer 3-56

one router finite buffers sender retransmission of timed-out packet

app-layer input = app-layer outputin = out

transport-layer input includes retransmissions rsquoin

ge in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

Causescosts of congestion scenario 2

Transport Layer 3-57

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-58

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-59

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-60

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-61

R2

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-62

four senders multihop paths timeoutretransmit

Q what happens as in and rsquo

in increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host CHost D

in original data

in original data plus

retransmitted data

A as red rsquoin increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-63

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

bandwidth used for that packet wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 53: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-53

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-54

congestion informally ldquotoo many sources sending

too much data too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Principles of congestion control

Transport Layer 3-55

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

output link capacity R

no retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

Transport Layer 3-56

one router finite buffers sender retransmission of timed-out packet

app-layer input = app-layer outputin = out

transport-layer input includes retransmissions rsquoin

ge in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

Causescosts of congestion scenario 2

Transport Layer 3-57

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-58

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-59

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-60

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-61

R2

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-62

four senders multihop paths timeoutretransmit

Q what happens as in and rsquo

in increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host CHost D

in original data

in original data plus

retransmitted data

A as red rsquoin increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-63

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

bandwidth used for that packet wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 54: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-54

congestion informally ldquotoo many sources sending

too much data too fast for network to handlerdquo

different from flow control manifestations

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem

Principles of congestion control

Transport Layer 3-55

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

output link capacity R

no retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

Transport Layer 3-56

one router finite buffers sender retransmission of timed-out packet

app-layer input = app-layer outputin = out

transport-layer input includes retransmissions rsquoin

ge in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

Causescosts of congestion scenario 2

Transport Layer 3-57

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-58

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-59

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-60

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-61

R2

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-62

four senders multihop paths timeoutretransmit

Q what happens as in and rsquo

in increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host CHost D

in original data

in original data plus

retransmitted data

A as red rsquoin increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-63

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

bandwidth used for that packet wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 55: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-55

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

output link capacity R

no retransmission

maximum per-connection throughput R2

unlimited shared output link buffers

Host A

original data in

Host B

throughputout

R2

R2

out

in R2d

ela

yin

large delays as arrival rate in approaches capacity

Transport Layer 3-56

one router finite buffers sender retransmission of timed-out packet

app-layer input = app-layer outputin = out

transport-layer input includes retransmissions rsquoin

ge in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

Causescosts of congestion scenario 2

Transport Layer 3-57

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-58

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-59

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-60

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-61

R2

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-62

four senders multihop paths timeoutretransmit

Q what happens as in and rsquo

in increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host CHost D

in original data

in original data plus

retransmitted data

A as red rsquoin increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-63

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

bandwidth used for that packet wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 56: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-56

one router finite buffers sender retransmission of timed-out packet

app-layer input = app-layer outputin = out

transport-layer input includes retransmissions rsquoin

ge in

finite shared output link buffers

Host A

in original data

Host B

outin original data plus

retransmitted data

Causescosts of congestion scenario 2

Transport Layer 3-57

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-58

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-59

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-60

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-61

R2

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-62

four senders multihop paths timeoutretransmit

Q what happens as in and rsquo

in increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host CHost D

in original data

in original data plus

retransmitted data

A as red rsquoin increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-63

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

bandwidth used for that packet wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 57: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-57

idealization perfect knowledge

sender sends only when router buffers available

finite shared output link buffers

in original dataoutin original data plus

retransmitted data

copy

free buffer space

R2

R2

out

in

Causescosts of congestion scenario 2

Host B

A

Transport Layer 3-58

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-59

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-60

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-61

R2

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-62

four senders multihop paths timeoutretransmit

Q what happens as in and rsquo

in increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host CHost D

in original data

in original data plus

retransmitted data

A as red rsquoin increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-63

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

bandwidth used for that packet wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 58: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-58

in original dataoutin original data plus

retransmitted data

copy

no buffer space

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

Causescosts of congestion scenario 2

A

Host B

Transport Layer 3-59

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-60

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-61

R2

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-62

four senders multihop paths timeoutretransmit

Q what happens as in and rsquo

in increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host CHost D

in original data

in original data plus

retransmitted data

A as red rsquoin increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-63

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

bandwidth used for that packet wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 59: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-59

in original dataoutin original data plus

retransmitted data

free buffer space

Causescosts of congestion scenario 2

Idealization known loss packets can be lost dropped at router due to full buffers

sender only resends if packet known to be lost

R2

R2in

out

when sending at R2 some packets are retransmissions but asymptotic goodput is still R2 (why)

A

Host B

Transport Layer 3-60

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-61

R2

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-62

four senders multihop paths timeoutretransmit

Q what happens as in and rsquo

in increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host CHost D

in original data

in original data plus

retransmitted data

A as red rsquoin increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-63

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

bandwidth used for that packet wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 60: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-60

A

in outincopy

free buffer space

timeout

R2

R2in

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

Host B

Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Causescosts of congestion scenario 2

Transport Layer 3-61

R2

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-62

four senders multihop paths timeoutretransmit

Q what happens as in and rsquo

in increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host CHost D

in original data

in original data plus

retransmitted data

A as red rsquoin increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-63

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

bandwidth used for that packet wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 61: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-61

R2

out

when sending at R2 some packets are retransmissions including duplicated that are delivered

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple

copies of pkt decreasing goodput

R2in

Causescosts of congestion scenario 2 Realistic duplicates packets can be lost

dropped at router due to full buffers

sender times out prematurely sending two copies both of which are delivered

Transport Layer 3-62

four senders multihop paths timeoutretransmit

Q what happens as in and rsquo

in increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host CHost D

in original data

in original data plus

retransmitted data

A as red rsquoin increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-63

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

bandwidth used for that packet wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 62: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-62

four senders multihop paths timeoutretransmit

Q what happens as in and rsquo

in increase

finite shared output link buffers

Host A out

Causescosts of congestion scenario 3

Host B

Host CHost D

in original data

in original data plus

retransmitted data

A as red rsquoin increases all

arriving blue pkts at upper queue are dropped blue throughput 0

Transport Layer 3-63

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

bandwidth used for that packet wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 63: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-63

another ldquocostrdquo of congestion when packet dropped any ldquoupstream

bandwidth used for that packet wasted

Causescosts of congestion scenario 3

C2

C2

ou

t

inrsquo

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 64: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-64

Approaches towards congestion controltwo broad approaches towards congestion

controlend-end

congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

network-assisted congestion control

routers provide feedback to end systemssingle bit indicating congestion (SNA DECbit TCPIP ECN ATM)

explicit rate for sender to send at

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 65: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-65

Case study ATM ABR congestion control

ABR available bit rate

ldquoelastic servicerdquo if senderrsquos path

ldquounderloadedrdquo sender should

use available bandwidth

if senderrsquos path congested sender throttled

to minimum guaranteed rate

RM (resource management) cells

sent by sender interspersed with data cells

bits in RM cell set by switches (ldquonetwork-assistedrdquo) NI bit no increase in rate

(mild congestion) CI bit congestion

indication RM cells returned to sender

by receiver with bits intact

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 66: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-66

Case study ATM ABR congestion control

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sendersrsquo send rate thus max supportable rate on path

EFCI bit in data cells set to 1 in congested switch if data cell preceding RM cell has EFCI set receiver

sets CI bit in returned RM cell

RM cell data cell

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 67: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-67

Transport Layer Outline

1 transport-layer services

2 multiplexing and demultiplexing

3 connectionless transport UDP

4 connection-oriented transport TCP segment structure reliable data

transfer flow control connection

management5 principles of

congestion control6 TCP congestion

control

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 68: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-68

TCP congestion control additive increase multiplicative decrease

approach sender increases transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1

MSS every RTT until loss detected multiplicative decrease cut cwnd in half

after loss

cwnd

TC

P s

ende

r co

nges

tion

win

dow

siz

e

AIMD saw toothbehavior probing

for bandwidth

additively increase window size helliphellip until loss occurs (then cut window in half)

time

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 69: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-69

TCP congestion control window

sender limits transmission

cwnd is dynamic function of perceived congestion

TCP sending rate roughly send

cwnd bytes wait RTT for ACKS then send more bytes

last byteACKed sent not-

yet ACKed(ldquoin-flightrdquo)

last byte sent

cwnd

LastByteSent -LastByteAcked

lt cwnd

sender sequence number space

rate ~~cwnd

RTTbytessec

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 70: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-70

TCP Slow Start when connection

begins increase rate exponentially until first loss event initially cwnd = 1 MSS double cwnd every

RTT done by incrementing cwnd upon every ACK

summary initial rate is slow but ramps up exponentially fast

Host A

one segment

RT

T

Host B

time

two segments

four segments

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 71: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-71

TCP detecting reacting to loss

loss indicated by timeout cwnd set to 1 MSS window then grows exponentially (as in slow start) to threshold

then grows linearly loss indicated by 3 duplicate ACKs TCP RENO

dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly

TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 72: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-72

Q when should the exponential increase switch to linear

A when cwnd gets to 12 of its value before timeout

Implementation variable ssthresh on loss event ssthresh is set to 12 of cwnd just before loss event

TCP slow start cong avoidance

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 73: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-73

Summary TCP Congestion Control

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0

transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3retransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2

cwnd = 1 MSSdupACKcount = 0

retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSS

ssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 74: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-74

TCP throughput Simplistic model avg TCP thruput as function of window

size RTT ignore slow start assume always data to send

W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg throughput is 34W per RTT

W

W2

avg TCP thruput = 34W

RTTbytessec

In practice W not known or fixed so this model is too simplistic to be useful

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 75: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-75

TCP throughput More practical model Throughput in terms of segment loss

probability L round-trip time T and maximum segment size M [Mathis et al 1997]

TCP throughput = 122 MT L

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 76: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-76

TCP futures TCP over ldquolong fat pipesrdquo example 1500 byte segments 100ms

RTT want 10 Gbps throughput requires W = 83333 in-flight segments

as per the throughput formula

to achieve 10 Gbps throughput need a loss rate of L = 210-10 ndash an unrealistically small loss rate

new versions of TCP for high-speed

TCP throughput = 122 MSSRTT L

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 77: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

TCP throughput wrap-up

Assume sender window cwnd receiver window rwnd bottleneck capacity C round-trip time T path loss rate L maximum segment size MSS Then Instantaneous TCP throughput =

bull min(C cwndTrwndT) Steady-state TCP throughput =

bull min(C 122M(TradicL))

Transport Layer 3-77

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 78: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-78

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP Fairness

TCP connection 2

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 79: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-79

Why is TCP fairtwo competing sessions additive increase gives slope of 1 as throughout

increases multiplicative decrease decreases throughput

proportionally R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 th

roug

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
Page 80: Transport Layer: UDP and TCP CS491G: Computer Networking Lab V. Arun Transport Layer 3-1 Slides adapted from Kurose and Ross.

Transport Layer 3-80

Fairness (more)Fairness and UDP multimedia apps

often do not use TCP rate throttling by

congestion control can hurt streaming quality

instead use UDP send audiovideo

at constant rate tolerate packet loss

Fairness parallel TCP connections

application can open many parallel connections between two hosts

web browsers do this eg link of rate R with 9

existing connections new app asks for 1 TCP gets

R10 new app asks for 11 TCPs gets

R2

  • Transport Layer UDP and TCP
  • Transport Layer Outline
  • Transport services and protocols
  • Transport vs network layer
  • Internet transport-layer protocols
  • Slide 6
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux example
  • Connection-oriented demux
  • Connection-oriented demux example
  • Slide 13
  • Slide 14
  • UDP User Datagram Protocol [RFC 768]
  • UDP segment header
  • UDP checksum
  • Internet checksum example
  • Q1 Sockets and multiplexing
  • Q2 Sockets UDP
  • Q3 Sockets TCP
  • Q4 Sockets TCP
  • Q5 UDP checksums
  • Slide 24
  • TCP Overview RFCs 79311221323 2018 2581
  • TCP segment structure
  • TCP seq numbers ACKs
  • Slide 28
  • TCP round trip time timeout
  • Slide 30
  • Slide 31
  • Slide 32
  • TCP reliable data transfer
  • TCP sender events
  • TCP sender (simplified)
  • TCP retransmission scenarios
  • Slide 37
  • TCP ACK generation [RFC 1122 RFC 2581]
  • TCP fast retransmit
  • Slide 40
  • Slide 41
  • TCP flow control
  • Slide 43
  • Slide 44
  • Connection Management
  • Agreeing to establish a connection
  • Slide 47
  • TCP 3-way handshake
  • TCP 3-way handshake FSM
  • TCP closing a connection
  • Slide 51
  • TCP Overall state machine
  • Slide 53
  • Principles of congestion control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Causescosts of congestion scenario 3
  • Slide 63
  • Approaches towards congestion control
  • Case study ATM ABR congestion control
  • Slide 66
  • Slide 67
  • TCP congestion control additive increase multiplicative decrease
  • TCP congestion control window
  • TCP Slow Start
  • TCP detecting reacting to loss
  • TCP slow start cong avoidance
  • Summary TCP Congestion Control
  • TCP throughput Simplistic model
  • TCP throughput More practical model
  • TCP futures TCP over ldquolong fat pipesrdquo
  • TCP throughput wrap-up
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)