Transport Layer. Transport Layer – Topics r Review: multiplexing, connection and connectionless...

135
Transport Layer
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    263
  • download

    0

Transcript of Transport Layer. Transport Layer – Topics r Review: multiplexing, connection and connectionless...

Transport Layer

Transport Layer ndash Topics Review multiplexing connection and

connectionless transport services provided by a transport layer

UDP Tools for transport layer

Error detection ACKNACK ARQ Approaches to transport

Go-Back-N Selective repeat

TCP Services TCP Connection setup acks and seq num timeout and

triple-dup ack slow-start congestion avoidance

Transport layer

Transfers messages between application in hosts For ftp you exchange files and directory information For http you exchange requests and repliesfiles For smtp messages are exchanged

Services possibly provided Reliability Error detectioncorrection Flowcongestion control Multiplexing (support several messages being

transported simultaneously)

Connection oriented connectionless TCP supports the idea of a connection

Once listen and connect complete there is a logical connection between the hosts

The state of the connection can be determined (the connection is cut or not)

bull But TCP does not have a heartbeat message UDP is connectionless

Packets are just sent There is no concept (supported by the transport layer) of a connection

The application can make a connection over UDP So the application is each host will support the hand-shaking and monitoring the state of the ldquoconnectionrdquo

There are several other transport layer protocols besides TCP and UDP but TCP and UDP are the most popular

TCP vs UCP Connection oriented

Connections must be set up The state of the connection

can be determined Flowcongestion control

Limits congestion in the network and end hosts

Control how fast data can be sent

Larger Packet header Retransmits lost packets and

reports if packets were not successfully transmitted

Check sum for error detection

Connectionless Connections does not need to

be set-up The state of the connection is

unknown No flowcongestion control

Could cause excessive congestion and unfair usage

Data can be sent exactly when it needs to be

Low overhead No feedback provided as to

whether packets were successfully transmitted

Check sum for error detection

Applications and Transport Protocols

Smtpmail TCP telnet TCP http TCP ftp TCP NFS UDP or TCP (why udp I do not know) Multimedia streaming UDP or TCP Voice over ip ndash UDP Routing ndashUDP its own or TCP DNS -UDP

Multiplexing with ports

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport layer packet headers always contain source and destination portIP headers have source and destination IPs

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

UDP User Datagram Protocol [RFC 768]

ldquono frillsrdquo ldquobare bonesrdquo Internet transport protocol

ldquobest effortrdquo service UDP segments may be lost delivered out of order

to app connectionless

no handshaking between UDP sender receiver

each UDP segment handled independently of others

Why is there a UDP no connection

establishment (which can add delay)

simple no connection state at sender receiver

small segment header no congestion control

UDP can blast away as fast as desired

UDP more

often used for streaming multimedia apps loss tolerant rate sensitive

other UDP uses DNS SNMP

reliable transfer over UDP add reliability at application layer application-specific

error recovery

source port dest port

32 bits

Applicationdata

(message)

UDP segment format

length checksumLength in

bytes of UDPsegmentincluding

header

UDP checksum

Sender treat segment contents

as sequence of 16-bit integers

checksum addition (1rsquos complement sum) of segment contents

sender puts checksum value into UDP checksum field

Receiver compute checksum of

received segment check if computed checksum

equals checksum field value NO - error detected YES - no error detected

But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

Internet Checksum Example Note

When adding numbers a carryout from the most significant bit needs to be added to the result

Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Reliable data transfer

Principles of Reliable data transfer

Principles of Reliable data transfer

Reliable data transfer getting started

sendside

receiveside

rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer

udt_send() called by rdtto transfer packet over unreliable channel to

receiver

rdt_rcv() called when packet arrives on rcv-side of channel

deliver_data() called by rdt to deliver data to

upper

Reliable data transfer getting startedWersquoll incrementally develop sender receiver

sides of reliable data transfer protocol (rdt) consider only unidirectional data transfer

but control info will flow on both directions

use finite state machines (FSM) to specify sender receiver

state1

state2

event causing state transitionactions taken on state transition

state when in this ldquostaterdquo next state

uniquely determined by next event

eventactions

Rdt10 reliable transfer over a reliable channel

underlying channel perfectly reliable no bit errors no loss of packets

separate FSMs for sender receiver sender sends data into underlying channel receiver read data from underlying channel

Rdt10 reliable transfer over a reliable channel

underlying channel perfectly reliable no bit errors no loss of packets

separate FSMs for sender receiver sender sends data into underlying channel receiver read data from underlying channel

packet = make_pkt(data)udt_send(packet)

Wait for call from above

rdt_send(data)

sender

extract (packetdata)deliver_data(data)

Wait for call from

below

rdt_rcv(packet)

receiver

Rdt20 channel with bit errors

underlying channel may flip bits in packets checksum to detect bit errors

the question how to recover from errors acknowledgements (ACKs) receiver explicitly tells

sender that pkt received OK negative acknowledgements (NAKs) receiver

explicitly tells sender that pkt had errors sender retransmits pkt on receipt of NAK

new mechanisms in rdt20 (beyond rdt10) error detection receiver feedback control msgs (ACKNAK) rcvr-

gtsender

rdt20 FSM specification

rdt20 FSM specification

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

belowsender

receiverrdt_send(data)

rdt20 has a fatal flaw

What happens if ACKNAK corrupted

sender doesnrsquot know what happened at receiver

canrsquot just retransmit possible duplicate

Handling duplicates sender retransmits current

pkt if ACKNAK garbled sender adds sequence

number to each pkt receiver discards (doesnrsquot

deliver up) duplicate pkt

Sender sends one packet then waits for receiver response

stop and wait

rdt21 sender handles garbled ACKNAKs

rdt21 receiver handles garbled ACKNAKs

rdt21 sender handles garbled ACKNAKs

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

Wait for ACK or NAK 0 udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

Wait for call 1 from

above

Wait for ACK or NAK 1

rdt21 receiver handles garbled ACKNAKs

Wait for 0 from below

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt21 discussion

Sender seq added to pkt two seq rsquos (01)

will suffice Why must check if

received ACKNAK corrupted

twice as many states state must

ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver must check if

received packet is duplicate state indicates

whether 0 or 1 is expected pkt seq

note receiver can not know if its last ACKNAK received OK at sender

rdt22 a NAK-free protocol

same functionality as rdt21 using ACKs only instead of NAK receiver sends ACK for last pkt

received OK receiver must explicitly include seq of pkt being

ACKed

duplicate ACK at sender results in same action as NAK retransmit current pkt

rdt22 sender receiver fragments

rdt22 sender receiver fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

receiver FSMfragment

What happens if a pkt is duplicated

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs) checksum seq

ACKs retransmissions will be of help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

retransmits if no ACK received in this time

if pkt (or ACK) just delayed (not lost) retransmission will be

duplicate but use of seq rsquos already handles this

receiver must specify seq of pkt being ACKed

requires countdown timer

rdt30 sender

rdt30 sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

rdt_rcv(rcvpkt)

resend pkt1

rdt30 in action

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1

rec ack0

rec pkt1

send ack1rec ack1

send pkt1

rec pkt1

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1rec ack0

send pkt2

rec pkt2

TO

rec pkt1send ack1

rec ack1

rdt30 in action

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1rec pkt1

TO

send ack1

send pkt2rec ack1

send ack1

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1

rec pkt1

TO

send ack1

send pkt

rec ack1

send ack1

send pkt2

rec pkt2rec ack1

send no pkt (dupACK) send ack2rec ack2

send pkt2

Performance of rdt30 rdt30 works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit packet and 100bit ACK

What is the total delaybull Data transmission delay

ndash 8000109 = 810-6

bull ACK Transmission delayndash 100109 = 10-7 sec

bull Total Delay ndash 215ms + 008 + 0001=300081ms

Utilization Time transmitting total time 008 300081 = 000027

This is one pkt every 30msec or 33 kBsec over a 1 Gbps link

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Pipelining Protocols

Go-back-N big picture Sender can have up

to N unacked packets in pipeline

Rcvr only sends cumulative acks Doesnrsquot ack packet if

therersquos a gap Sender has timer for

oldest unacked packet If timer expires

retransmit all unacked packets

Selective Repeat big pic

Sender can have up to N unacked packets in pipeline

Rcvr acks individual packets

Sender maintains timer for each unacked packet When timer expires

retransmit only unack packet

Selective repeat big picture

Sender can have up to N unacked packets in pipeline

Receiver acks individual packets Sender maintains timer for each

unacked packet When timer expires retransmit only unack

packet

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

Go-Back-NPkt that could be sent unACKed pkt

Unused pktACKed pkt

send pkt

window

1 unACKed pkts

startwindowN=12

0 unACKed pkts

pkts

send pkts

windowN unACKed pkts

Next pkt to be sent

ACK arrives

windowN=12

Send pkt

N unACKed pkts

windowN-1 unACKed pkts

Sliding window

State of pkts

Go-Back-N Pkt that could be sent unACKed pkt

Unused pktACKed pkt

window

N unACKed pkts

window

N-1 unACKed pkts

ACK arrives

Send pkt

N unACKed pkts

window

N unACKed pkts

windowNo ACK arrives hellip timeout

0 unACKed pkts

window

GBN sender extended Activity Diagram

GBN Receiver Activity Diagram

GBN sender extended Activity DiagramWaiting for file

Set NSet NextPktToSend=0

Set LastACKed=-1

NextPktToSend ndash LastACKedltN

Send pkt[NextPktToSend] with SeqNum= NextPktToSend NextPktToSend++

Set Timer(NextPktToSend) = Now + TO

Wait

Clear Timers(LastACKed+1 to AN)LastACKed = AN

ACK arrived with ACKNum = AN

otherwise

Clear Timers(LastACKed+1 to NextPktToSend-1)NextPktToSend = LastACKed+1

Timer expires

GBN Receiver Activity Diagramstart

Set NextPktToRec = 0Clear ReceiverBufferClear ReceivedPktsReceiverBase = 0

ReceivedPkts[NextPktToRec] == 1

NextPktToRec++Give ReceiverBuffer[NextPktToRec] to app

Send ACK with ACKNum = NextPktToRec - 1

Place Pkt in ReceiverBuffer[SeqNum]ReceivedPkts[SeqNum]=1

wait

otherwise

GBN sender extended FSM

Wait start_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

GBN receiver extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

GBN in Action

sender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK=0Rec 1 give to app and Send ACK=1Rec 2 give to app and Send ACK=2

Rec 3 give to app and Send ACK=3

Rec 4 give to app and Send ACK=4

Rec 5 give to app and Send ACK=5

Rec 7 discard and Send ACK=5

Rec 8 discard and Send ACK=5

Rec 9 discard and Send ACK=5

Rec 10 discard and Send ACK=5

Rec 11 discard and Send ACK=5

Rec 12 discard and Send ACK=5

Rec 13 discard and Send ACK=5

Rec 6 give to app and Send ACK=6

Rec 7 give to app and Send ACK=7Rec 8 give to app and Send ACK=8

Rec 9 give to app and Send ACK=9

Optimal size of N in GBN

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order

delivery to upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart

timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed

pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]

send ACK(n) out-of-order buffer in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

Selective repeat in action

Summary of transport layer tools used so far

ACK and NACK Sequence numbers (and no NACK) Time out Sliding window

Optimal size = bandwidth delay product (if no other flows are using the network)

Cumulative ACK Buffer at the receiver is optional

Selective ACK Requires buffering at the receiver

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size

connection-oriented handshaking (exchange

of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver

reliable in-order byte steam

Pipelined and time-varying window size TCP congestion and

flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

It can be used as a pointer for placing the received data in the receiver buffer

ACKs seq of next byte

expected from other side

cumulative ACK

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Transport Layer ndash Topics Review multiplexing connection and

connectionless transport services provided by a transport layer

UDP Tools for transport layer

Error detection ACKNACK ARQ Approaches to transport

Go-Back-N Selective repeat

TCP Services TCP Connection setup acks and seq num timeout and

triple-dup ack slow-start congestion avoidance

Transport layer

Transfers messages between application in hosts For ftp you exchange files and directory information For http you exchange requests and repliesfiles For smtp messages are exchanged

Services possibly provided Reliability Error detectioncorrection Flowcongestion control Multiplexing (support several messages being

transported simultaneously)

Connection oriented connectionless TCP supports the idea of a connection

Once listen and connect complete there is a logical connection between the hosts

The state of the connection can be determined (the connection is cut or not)

bull But TCP does not have a heartbeat message UDP is connectionless

Packets are just sent There is no concept (supported by the transport layer) of a connection

The application can make a connection over UDP So the application is each host will support the hand-shaking and monitoring the state of the ldquoconnectionrdquo

There are several other transport layer protocols besides TCP and UDP but TCP and UDP are the most popular

TCP vs UCP Connection oriented

Connections must be set up The state of the connection

can be determined Flowcongestion control

Limits congestion in the network and end hosts

Control how fast data can be sent

Larger Packet header Retransmits lost packets and

reports if packets were not successfully transmitted

Check sum for error detection

Connectionless Connections does not need to

be set-up The state of the connection is

unknown No flowcongestion control

Could cause excessive congestion and unfair usage

Data can be sent exactly when it needs to be

Low overhead No feedback provided as to

whether packets were successfully transmitted

Check sum for error detection

Applications and Transport Protocols

Smtpmail TCP telnet TCP http TCP ftp TCP NFS UDP or TCP (why udp I do not know) Multimedia streaming UDP or TCP Voice over ip ndash UDP Routing ndashUDP its own or TCP DNS -UDP

Multiplexing with ports

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport layer packet headers always contain source and destination portIP headers have source and destination IPs

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

UDP User Datagram Protocol [RFC 768]

ldquono frillsrdquo ldquobare bonesrdquo Internet transport protocol

ldquobest effortrdquo service UDP segments may be lost delivered out of order

to app connectionless

no handshaking between UDP sender receiver

each UDP segment handled independently of others

Why is there a UDP no connection

establishment (which can add delay)

simple no connection state at sender receiver

small segment header no congestion control

UDP can blast away as fast as desired

UDP more

often used for streaming multimedia apps loss tolerant rate sensitive

other UDP uses DNS SNMP

reliable transfer over UDP add reliability at application layer application-specific

error recovery

source port dest port

32 bits

Applicationdata

(message)

UDP segment format

length checksumLength in

bytes of UDPsegmentincluding

header

UDP checksum

Sender treat segment contents

as sequence of 16-bit integers

checksum addition (1rsquos complement sum) of segment contents

sender puts checksum value into UDP checksum field

Receiver compute checksum of

received segment check if computed checksum

equals checksum field value NO - error detected YES - no error detected

But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

Internet Checksum Example Note

When adding numbers a carryout from the most significant bit needs to be added to the result

Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Reliable data transfer

Principles of Reliable data transfer

Principles of Reliable data transfer

Reliable data transfer getting started

sendside

receiveside

rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer

udt_send() called by rdtto transfer packet over unreliable channel to

receiver

rdt_rcv() called when packet arrives on rcv-side of channel

deliver_data() called by rdt to deliver data to

upper

Reliable data transfer getting startedWersquoll incrementally develop sender receiver

sides of reliable data transfer protocol (rdt) consider only unidirectional data transfer

but control info will flow on both directions

use finite state machines (FSM) to specify sender receiver

state1

state2

event causing state transitionactions taken on state transition

state when in this ldquostaterdquo next state

uniquely determined by next event

eventactions

Rdt10 reliable transfer over a reliable channel

underlying channel perfectly reliable no bit errors no loss of packets

separate FSMs for sender receiver sender sends data into underlying channel receiver read data from underlying channel

Rdt10 reliable transfer over a reliable channel

underlying channel perfectly reliable no bit errors no loss of packets

separate FSMs for sender receiver sender sends data into underlying channel receiver read data from underlying channel

packet = make_pkt(data)udt_send(packet)

Wait for call from above

rdt_send(data)

sender

extract (packetdata)deliver_data(data)

Wait for call from

below

rdt_rcv(packet)

receiver

Rdt20 channel with bit errors

underlying channel may flip bits in packets checksum to detect bit errors

the question how to recover from errors acknowledgements (ACKs) receiver explicitly tells

sender that pkt received OK negative acknowledgements (NAKs) receiver

explicitly tells sender that pkt had errors sender retransmits pkt on receipt of NAK

new mechanisms in rdt20 (beyond rdt10) error detection receiver feedback control msgs (ACKNAK) rcvr-

gtsender

rdt20 FSM specification

rdt20 FSM specification

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

belowsender

receiverrdt_send(data)

rdt20 has a fatal flaw

What happens if ACKNAK corrupted

sender doesnrsquot know what happened at receiver

canrsquot just retransmit possible duplicate

Handling duplicates sender retransmits current

pkt if ACKNAK garbled sender adds sequence

number to each pkt receiver discards (doesnrsquot

deliver up) duplicate pkt

Sender sends one packet then waits for receiver response

stop and wait

rdt21 sender handles garbled ACKNAKs

rdt21 receiver handles garbled ACKNAKs

rdt21 sender handles garbled ACKNAKs

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

Wait for ACK or NAK 0 udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

Wait for call 1 from

above

Wait for ACK or NAK 1

rdt21 receiver handles garbled ACKNAKs

Wait for 0 from below

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt21 discussion

Sender seq added to pkt two seq rsquos (01)

will suffice Why must check if

received ACKNAK corrupted

twice as many states state must

ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver must check if

received packet is duplicate state indicates

whether 0 or 1 is expected pkt seq

note receiver can not know if its last ACKNAK received OK at sender

rdt22 a NAK-free protocol

same functionality as rdt21 using ACKs only instead of NAK receiver sends ACK for last pkt

received OK receiver must explicitly include seq of pkt being

ACKed

duplicate ACK at sender results in same action as NAK retransmit current pkt

rdt22 sender receiver fragments

rdt22 sender receiver fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

receiver FSMfragment

What happens if a pkt is duplicated

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs) checksum seq

ACKs retransmissions will be of help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

retransmits if no ACK received in this time

if pkt (or ACK) just delayed (not lost) retransmission will be

duplicate but use of seq rsquos already handles this

receiver must specify seq of pkt being ACKed

requires countdown timer

rdt30 sender

rdt30 sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

rdt_rcv(rcvpkt)

resend pkt1

rdt30 in action

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1

rec ack0

rec pkt1

send ack1rec ack1

send pkt1

rec pkt1

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1rec ack0

send pkt2

rec pkt2

TO

rec pkt1send ack1

rec ack1

rdt30 in action

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1rec pkt1

TO

send ack1

send pkt2rec ack1

send ack1

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1

rec pkt1

TO

send ack1

send pkt

rec ack1

send ack1

send pkt2

rec pkt2rec ack1

send no pkt (dupACK) send ack2rec ack2

send pkt2

Performance of rdt30 rdt30 works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit packet and 100bit ACK

What is the total delaybull Data transmission delay

ndash 8000109 = 810-6

bull ACK Transmission delayndash 100109 = 10-7 sec

bull Total Delay ndash 215ms + 008 + 0001=300081ms

Utilization Time transmitting total time 008 300081 = 000027

This is one pkt every 30msec or 33 kBsec over a 1 Gbps link

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Pipelining Protocols

Go-back-N big picture Sender can have up

to N unacked packets in pipeline

Rcvr only sends cumulative acks Doesnrsquot ack packet if

therersquos a gap Sender has timer for

oldest unacked packet If timer expires

retransmit all unacked packets

Selective Repeat big pic

Sender can have up to N unacked packets in pipeline

Rcvr acks individual packets

Sender maintains timer for each unacked packet When timer expires

retransmit only unack packet

Selective repeat big picture

Sender can have up to N unacked packets in pipeline

Receiver acks individual packets Sender maintains timer for each

unacked packet When timer expires retransmit only unack

packet

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

Go-Back-NPkt that could be sent unACKed pkt

Unused pktACKed pkt

send pkt

window

1 unACKed pkts

startwindowN=12

0 unACKed pkts

pkts

send pkts

windowN unACKed pkts

Next pkt to be sent

ACK arrives

windowN=12

Send pkt

N unACKed pkts

windowN-1 unACKed pkts

Sliding window

State of pkts

Go-Back-N Pkt that could be sent unACKed pkt

Unused pktACKed pkt

window

N unACKed pkts

window

N-1 unACKed pkts

ACK arrives

Send pkt

N unACKed pkts

window

N unACKed pkts

windowNo ACK arrives hellip timeout

0 unACKed pkts

window

GBN sender extended Activity Diagram

GBN Receiver Activity Diagram

GBN sender extended Activity DiagramWaiting for file

Set NSet NextPktToSend=0

Set LastACKed=-1

NextPktToSend ndash LastACKedltN

Send pkt[NextPktToSend] with SeqNum= NextPktToSend NextPktToSend++

Set Timer(NextPktToSend) = Now + TO

Wait

Clear Timers(LastACKed+1 to AN)LastACKed = AN

ACK arrived with ACKNum = AN

otherwise

Clear Timers(LastACKed+1 to NextPktToSend-1)NextPktToSend = LastACKed+1

Timer expires

GBN Receiver Activity Diagramstart

Set NextPktToRec = 0Clear ReceiverBufferClear ReceivedPktsReceiverBase = 0

ReceivedPkts[NextPktToRec] == 1

NextPktToRec++Give ReceiverBuffer[NextPktToRec] to app

Send ACK with ACKNum = NextPktToRec - 1

Place Pkt in ReceiverBuffer[SeqNum]ReceivedPkts[SeqNum]=1

wait

otherwise

GBN sender extended FSM

Wait start_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

GBN receiver extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

GBN in Action

sender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK=0Rec 1 give to app and Send ACK=1Rec 2 give to app and Send ACK=2

Rec 3 give to app and Send ACK=3

Rec 4 give to app and Send ACK=4

Rec 5 give to app and Send ACK=5

Rec 7 discard and Send ACK=5

Rec 8 discard and Send ACK=5

Rec 9 discard and Send ACK=5

Rec 10 discard and Send ACK=5

Rec 11 discard and Send ACK=5

Rec 12 discard and Send ACK=5

Rec 13 discard and Send ACK=5

Rec 6 give to app and Send ACK=6

Rec 7 give to app and Send ACK=7Rec 8 give to app and Send ACK=8

Rec 9 give to app and Send ACK=9

Optimal size of N in GBN

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order

delivery to upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart

timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed

pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]

send ACK(n) out-of-order buffer in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

Selective repeat in action

Summary of transport layer tools used so far

ACK and NACK Sequence numbers (and no NACK) Time out Sliding window

Optimal size = bandwidth delay product (if no other flows are using the network)

Cumulative ACK Buffer at the receiver is optional

Selective ACK Requires buffering at the receiver

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size

connection-oriented handshaking (exchange

of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver

reliable in-order byte steam

Pipelined and time-varying window size TCP congestion and

flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

It can be used as a pointer for placing the received data in the receiver buffer

ACKs seq of next byte

expected from other side

cumulative ACK

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Transport layer

Transfers messages between application in hosts For ftp you exchange files and directory information For http you exchange requests and repliesfiles For smtp messages are exchanged

Services possibly provided Reliability Error detectioncorrection Flowcongestion control Multiplexing (support several messages being

transported simultaneously)

Connection oriented connectionless TCP supports the idea of a connection

Once listen and connect complete there is a logical connection between the hosts

The state of the connection can be determined (the connection is cut or not)

bull But TCP does not have a heartbeat message UDP is connectionless

Packets are just sent There is no concept (supported by the transport layer) of a connection

The application can make a connection over UDP So the application is each host will support the hand-shaking and monitoring the state of the ldquoconnectionrdquo

There are several other transport layer protocols besides TCP and UDP but TCP and UDP are the most popular

TCP vs UCP Connection oriented

Connections must be set up The state of the connection

can be determined Flowcongestion control

Limits congestion in the network and end hosts

Control how fast data can be sent

Larger Packet header Retransmits lost packets and

reports if packets were not successfully transmitted

Check sum for error detection

Connectionless Connections does not need to

be set-up The state of the connection is

unknown No flowcongestion control

Could cause excessive congestion and unfair usage

Data can be sent exactly when it needs to be

Low overhead No feedback provided as to

whether packets were successfully transmitted

Check sum for error detection

Applications and Transport Protocols

Smtpmail TCP telnet TCP http TCP ftp TCP NFS UDP or TCP (why udp I do not know) Multimedia streaming UDP or TCP Voice over ip ndash UDP Routing ndashUDP its own or TCP DNS -UDP

Multiplexing with ports

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport layer packet headers always contain source and destination portIP headers have source and destination IPs

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

UDP User Datagram Protocol [RFC 768]

ldquono frillsrdquo ldquobare bonesrdquo Internet transport protocol

ldquobest effortrdquo service UDP segments may be lost delivered out of order

to app connectionless

no handshaking between UDP sender receiver

each UDP segment handled independently of others

Why is there a UDP no connection

establishment (which can add delay)

simple no connection state at sender receiver

small segment header no congestion control

UDP can blast away as fast as desired

UDP more

often used for streaming multimedia apps loss tolerant rate sensitive

other UDP uses DNS SNMP

reliable transfer over UDP add reliability at application layer application-specific

error recovery

source port dest port

32 bits

Applicationdata

(message)

UDP segment format

length checksumLength in

bytes of UDPsegmentincluding

header

UDP checksum

Sender treat segment contents

as sequence of 16-bit integers

checksum addition (1rsquos complement sum) of segment contents

sender puts checksum value into UDP checksum field

Receiver compute checksum of

received segment check if computed checksum

equals checksum field value NO - error detected YES - no error detected

But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

Internet Checksum Example Note

When adding numbers a carryout from the most significant bit needs to be added to the result

Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Reliable data transfer

Principles of Reliable data transfer

Principles of Reliable data transfer

Reliable data transfer getting started

sendside

receiveside

rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer

udt_send() called by rdtto transfer packet over unreliable channel to

receiver

rdt_rcv() called when packet arrives on rcv-side of channel

deliver_data() called by rdt to deliver data to

upper

Reliable data transfer getting startedWersquoll incrementally develop sender receiver

sides of reliable data transfer protocol (rdt) consider only unidirectional data transfer

but control info will flow on both directions

use finite state machines (FSM) to specify sender receiver

state1

state2

event causing state transitionactions taken on state transition

state when in this ldquostaterdquo next state

uniquely determined by next event

eventactions

Rdt10 reliable transfer over a reliable channel

underlying channel perfectly reliable no bit errors no loss of packets

separate FSMs for sender receiver sender sends data into underlying channel receiver read data from underlying channel

Rdt10 reliable transfer over a reliable channel

underlying channel perfectly reliable no bit errors no loss of packets

separate FSMs for sender receiver sender sends data into underlying channel receiver read data from underlying channel

packet = make_pkt(data)udt_send(packet)

Wait for call from above

rdt_send(data)

sender

extract (packetdata)deliver_data(data)

Wait for call from

below

rdt_rcv(packet)

receiver

Rdt20 channel with bit errors

underlying channel may flip bits in packets checksum to detect bit errors

the question how to recover from errors acknowledgements (ACKs) receiver explicitly tells

sender that pkt received OK negative acknowledgements (NAKs) receiver

explicitly tells sender that pkt had errors sender retransmits pkt on receipt of NAK

new mechanisms in rdt20 (beyond rdt10) error detection receiver feedback control msgs (ACKNAK) rcvr-

gtsender

rdt20 FSM specification

rdt20 FSM specification

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

belowsender

receiverrdt_send(data)

rdt20 has a fatal flaw

What happens if ACKNAK corrupted

sender doesnrsquot know what happened at receiver

canrsquot just retransmit possible duplicate

Handling duplicates sender retransmits current

pkt if ACKNAK garbled sender adds sequence

number to each pkt receiver discards (doesnrsquot

deliver up) duplicate pkt

Sender sends one packet then waits for receiver response

stop and wait

rdt21 sender handles garbled ACKNAKs

rdt21 receiver handles garbled ACKNAKs

rdt21 sender handles garbled ACKNAKs

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

Wait for ACK or NAK 0 udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

Wait for call 1 from

above

Wait for ACK or NAK 1

rdt21 receiver handles garbled ACKNAKs

Wait for 0 from below

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt21 discussion

Sender seq added to pkt two seq rsquos (01)

will suffice Why must check if

received ACKNAK corrupted

twice as many states state must

ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver must check if

received packet is duplicate state indicates

whether 0 or 1 is expected pkt seq

note receiver can not know if its last ACKNAK received OK at sender

rdt22 a NAK-free protocol

same functionality as rdt21 using ACKs only instead of NAK receiver sends ACK for last pkt

received OK receiver must explicitly include seq of pkt being

ACKed

duplicate ACK at sender results in same action as NAK retransmit current pkt

rdt22 sender receiver fragments

rdt22 sender receiver fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

receiver FSMfragment

What happens if a pkt is duplicated

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs) checksum seq

ACKs retransmissions will be of help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

retransmits if no ACK received in this time

if pkt (or ACK) just delayed (not lost) retransmission will be

duplicate but use of seq rsquos already handles this

receiver must specify seq of pkt being ACKed

requires countdown timer

rdt30 sender

rdt30 sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

rdt_rcv(rcvpkt)

resend pkt1

rdt30 in action

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1

rec ack0

rec pkt1

send ack1rec ack1

send pkt1

rec pkt1

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1rec ack0

send pkt2

rec pkt2

TO

rec pkt1send ack1

rec ack1

rdt30 in action

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1rec pkt1

TO

send ack1

send pkt2rec ack1

send ack1

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1

rec pkt1

TO

send ack1

send pkt

rec ack1

send ack1

send pkt2

rec pkt2rec ack1

send no pkt (dupACK) send ack2rec ack2

send pkt2

Performance of rdt30 rdt30 works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit packet and 100bit ACK

What is the total delaybull Data transmission delay

ndash 8000109 = 810-6

bull ACK Transmission delayndash 100109 = 10-7 sec

bull Total Delay ndash 215ms + 008 + 0001=300081ms

Utilization Time transmitting total time 008 300081 = 000027

This is one pkt every 30msec or 33 kBsec over a 1 Gbps link

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Pipelining Protocols

Go-back-N big picture Sender can have up

to N unacked packets in pipeline

Rcvr only sends cumulative acks Doesnrsquot ack packet if

therersquos a gap Sender has timer for

oldest unacked packet If timer expires

retransmit all unacked packets

Selective Repeat big pic

Sender can have up to N unacked packets in pipeline

Rcvr acks individual packets

Sender maintains timer for each unacked packet When timer expires

retransmit only unack packet

Selective repeat big picture

Sender can have up to N unacked packets in pipeline

Receiver acks individual packets Sender maintains timer for each

unacked packet When timer expires retransmit only unack

packet

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

Go-Back-NPkt that could be sent unACKed pkt

Unused pktACKed pkt

send pkt

window

1 unACKed pkts

startwindowN=12

0 unACKed pkts

pkts

send pkts

windowN unACKed pkts

Next pkt to be sent

ACK arrives

windowN=12

Send pkt

N unACKed pkts

windowN-1 unACKed pkts

Sliding window

State of pkts

Go-Back-N Pkt that could be sent unACKed pkt

Unused pktACKed pkt

window

N unACKed pkts

window

N-1 unACKed pkts

ACK arrives

Send pkt

N unACKed pkts

window

N unACKed pkts

windowNo ACK arrives hellip timeout

0 unACKed pkts

window

GBN sender extended Activity Diagram

GBN Receiver Activity Diagram

GBN sender extended Activity DiagramWaiting for file

Set NSet NextPktToSend=0

Set LastACKed=-1

NextPktToSend ndash LastACKedltN

Send pkt[NextPktToSend] with SeqNum= NextPktToSend NextPktToSend++

Set Timer(NextPktToSend) = Now + TO

Wait

Clear Timers(LastACKed+1 to AN)LastACKed = AN

ACK arrived with ACKNum = AN

otherwise

Clear Timers(LastACKed+1 to NextPktToSend-1)NextPktToSend = LastACKed+1

Timer expires

GBN Receiver Activity Diagramstart

Set NextPktToRec = 0Clear ReceiverBufferClear ReceivedPktsReceiverBase = 0

ReceivedPkts[NextPktToRec] == 1

NextPktToRec++Give ReceiverBuffer[NextPktToRec] to app

Send ACK with ACKNum = NextPktToRec - 1

Place Pkt in ReceiverBuffer[SeqNum]ReceivedPkts[SeqNum]=1

wait

otherwise

GBN sender extended FSM

Wait start_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

GBN receiver extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

GBN in Action

sender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK=0Rec 1 give to app and Send ACK=1Rec 2 give to app and Send ACK=2

Rec 3 give to app and Send ACK=3

Rec 4 give to app and Send ACK=4

Rec 5 give to app and Send ACK=5

Rec 7 discard and Send ACK=5

Rec 8 discard and Send ACK=5

Rec 9 discard and Send ACK=5

Rec 10 discard and Send ACK=5

Rec 11 discard and Send ACK=5

Rec 12 discard and Send ACK=5

Rec 13 discard and Send ACK=5

Rec 6 give to app and Send ACK=6

Rec 7 give to app and Send ACK=7Rec 8 give to app and Send ACK=8

Rec 9 give to app and Send ACK=9

Optimal size of N in GBN

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order

delivery to upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart

timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed

pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]

send ACK(n) out-of-order buffer in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

Selective repeat in action

Summary of transport layer tools used so far

ACK and NACK Sequence numbers (and no NACK) Time out Sliding window

Optimal size = bandwidth delay product (if no other flows are using the network)

Cumulative ACK Buffer at the receiver is optional

Selective ACK Requires buffering at the receiver

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size

connection-oriented handshaking (exchange

of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver

reliable in-order byte steam

Pipelined and time-varying window size TCP congestion and

flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

It can be used as a pointer for placing the received data in the receiver buffer

ACKs seq of next byte

expected from other side

cumulative ACK

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Connection oriented connectionless TCP supports the idea of a connection

Once listen and connect complete there is a logical connection between the hosts

The state of the connection can be determined (the connection is cut or not)

bull But TCP does not have a heartbeat message UDP is connectionless

Packets are just sent There is no concept (supported by the transport layer) of a connection

The application can make a connection over UDP So the application is each host will support the hand-shaking and monitoring the state of the ldquoconnectionrdquo

There are several other transport layer protocols besides TCP and UDP but TCP and UDP are the most popular

TCP vs UCP Connection oriented

Connections must be set up The state of the connection

can be determined Flowcongestion control

Limits congestion in the network and end hosts

Control how fast data can be sent

Larger Packet header Retransmits lost packets and

reports if packets were not successfully transmitted

Check sum for error detection

Connectionless Connections does not need to

be set-up The state of the connection is

unknown No flowcongestion control

Could cause excessive congestion and unfair usage

Data can be sent exactly when it needs to be

Low overhead No feedback provided as to

whether packets were successfully transmitted

Check sum for error detection

Applications and Transport Protocols

Smtpmail TCP telnet TCP http TCP ftp TCP NFS UDP or TCP (why udp I do not know) Multimedia streaming UDP or TCP Voice over ip ndash UDP Routing ndashUDP its own or TCP DNS -UDP

Multiplexing with ports

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport layer packet headers always contain source and destination portIP headers have source and destination IPs

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

UDP User Datagram Protocol [RFC 768]

ldquono frillsrdquo ldquobare bonesrdquo Internet transport protocol

ldquobest effortrdquo service UDP segments may be lost delivered out of order

to app connectionless

no handshaking between UDP sender receiver

each UDP segment handled independently of others

Why is there a UDP no connection

establishment (which can add delay)

simple no connection state at sender receiver

small segment header no congestion control

UDP can blast away as fast as desired

UDP more

often used for streaming multimedia apps loss tolerant rate sensitive

other UDP uses DNS SNMP

reliable transfer over UDP add reliability at application layer application-specific

error recovery

source port dest port

32 bits

Applicationdata

(message)

UDP segment format

length checksumLength in

bytes of UDPsegmentincluding

header

UDP checksum

Sender treat segment contents

as sequence of 16-bit integers

checksum addition (1rsquos complement sum) of segment contents

sender puts checksum value into UDP checksum field

Receiver compute checksum of

received segment check if computed checksum

equals checksum field value NO - error detected YES - no error detected

But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

Internet Checksum Example Note

When adding numbers a carryout from the most significant bit needs to be added to the result

Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Reliable data transfer

Principles of Reliable data transfer

Principles of Reliable data transfer

Reliable data transfer getting started

sendside

receiveside

rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer

udt_send() called by rdtto transfer packet over unreliable channel to

receiver

rdt_rcv() called when packet arrives on rcv-side of channel

deliver_data() called by rdt to deliver data to

upper

Reliable data transfer getting startedWersquoll incrementally develop sender receiver

sides of reliable data transfer protocol (rdt) consider only unidirectional data transfer

but control info will flow on both directions

use finite state machines (FSM) to specify sender receiver

state1

state2

event causing state transitionactions taken on state transition

state when in this ldquostaterdquo next state

uniquely determined by next event

eventactions

Rdt10 reliable transfer over a reliable channel

underlying channel perfectly reliable no bit errors no loss of packets

separate FSMs for sender receiver sender sends data into underlying channel receiver read data from underlying channel

Rdt10 reliable transfer over a reliable channel

underlying channel perfectly reliable no bit errors no loss of packets

separate FSMs for sender receiver sender sends data into underlying channel receiver read data from underlying channel

packet = make_pkt(data)udt_send(packet)

Wait for call from above

rdt_send(data)

sender

extract (packetdata)deliver_data(data)

Wait for call from

below

rdt_rcv(packet)

receiver

Rdt20 channel with bit errors

underlying channel may flip bits in packets checksum to detect bit errors

the question how to recover from errors acknowledgements (ACKs) receiver explicitly tells

sender that pkt received OK negative acknowledgements (NAKs) receiver

explicitly tells sender that pkt had errors sender retransmits pkt on receipt of NAK

new mechanisms in rdt20 (beyond rdt10) error detection receiver feedback control msgs (ACKNAK) rcvr-

gtsender

rdt20 FSM specification

rdt20 FSM specification

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

belowsender

receiverrdt_send(data)

rdt20 has a fatal flaw

What happens if ACKNAK corrupted

sender doesnrsquot know what happened at receiver

canrsquot just retransmit possible duplicate

Handling duplicates sender retransmits current

pkt if ACKNAK garbled sender adds sequence

number to each pkt receiver discards (doesnrsquot

deliver up) duplicate pkt

Sender sends one packet then waits for receiver response

stop and wait

rdt21 sender handles garbled ACKNAKs

rdt21 receiver handles garbled ACKNAKs

rdt21 sender handles garbled ACKNAKs

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

Wait for ACK or NAK 0 udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

Wait for call 1 from

above

Wait for ACK or NAK 1

rdt21 receiver handles garbled ACKNAKs

Wait for 0 from below

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt21 discussion

Sender seq added to pkt two seq rsquos (01)

will suffice Why must check if

received ACKNAK corrupted

twice as many states state must

ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver must check if

received packet is duplicate state indicates

whether 0 or 1 is expected pkt seq

note receiver can not know if its last ACKNAK received OK at sender

rdt22 a NAK-free protocol

same functionality as rdt21 using ACKs only instead of NAK receiver sends ACK for last pkt

received OK receiver must explicitly include seq of pkt being

ACKed

duplicate ACK at sender results in same action as NAK retransmit current pkt

rdt22 sender receiver fragments

rdt22 sender receiver fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

receiver FSMfragment

What happens if a pkt is duplicated

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs) checksum seq

ACKs retransmissions will be of help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

retransmits if no ACK received in this time

if pkt (or ACK) just delayed (not lost) retransmission will be

duplicate but use of seq rsquos already handles this

receiver must specify seq of pkt being ACKed

requires countdown timer

rdt30 sender

rdt30 sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

rdt_rcv(rcvpkt)

resend pkt1

rdt30 in action

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1

rec ack0

rec pkt1

send ack1rec ack1

send pkt1

rec pkt1

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1rec ack0

send pkt2

rec pkt2

TO

rec pkt1send ack1

rec ack1

rdt30 in action

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1rec pkt1

TO

send ack1

send pkt2rec ack1

send ack1

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1

rec pkt1

TO

send ack1

send pkt

rec ack1

send ack1

send pkt2

rec pkt2rec ack1

send no pkt (dupACK) send ack2rec ack2

send pkt2

Performance of rdt30 rdt30 works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit packet and 100bit ACK

What is the total delaybull Data transmission delay

ndash 8000109 = 810-6

bull ACK Transmission delayndash 100109 = 10-7 sec

bull Total Delay ndash 215ms + 008 + 0001=300081ms

Utilization Time transmitting total time 008 300081 = 000027

This is one pkt every 30msec or 33 kBsec over a 1 Gbps link

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Pipelining Protocols

Go-back-N big picture Sender can have up

to N unacked packets in pipeline

Rcvr only sends cumulative acks Doesnrsquot ack packet if

therersquos a gap Sender has timer for

oldest unacked packet If timer expires

retransmit all unacked packets

Selective Repeat big pic

Sender can have up to N unacked packets in pipeline

Rcvr acks individual packets

Sender maintains timer for each unacked packet When timer expires

retransmit only unack packet

Selective repeat big picture

Sender can have up to N unacked packets in pipeline

Receiver acks individual packets Sender maintains timer for each

unacked packet When timer expires retransmit only unack

packet

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

Go-Back-NPkt that could be sent unACKed pkt

Unused pktACKed pkt

send pkt

window

1 unACKed pkts

startwindowN=12

0 unACKed pkts

pkts

send pkts

windowN unACKed pkts

Next pkt to be sent

ACK arrives

windowN=12

Send pkt

N unACKed pkts

windowN-1 unACKed pkts

Sliding window

State of pkts

Go-Back-N Pkt that could be sent unACKed pkt

Unused pktACKed pkt

window

N unACKed pkts

window

N-1 unACKed pkts

ACK arrives

Send pkt

N unACKed pkts

window

N unACKed pkts

windowNo ACK arrives hellip timeout

0 unACKed pkts

window

GBN sender extended Activity Diagram

GBN Receiver Activity Diagram

GBN sender extended Activity DiagramWaiting for file

Set NSet NextPktToSend=0

Set LastACKed=-1

NextPktToSend ndash LastACKedltN

Send pkt[NextPktToSend] with SeqNum= NextPktToSend NextPktToSend++

Set Timer(NextPktToSend) = Now + TO

Wait

Clear Timers(LastACKed+1 to AN)LastACKed = AN

ACK arrived with ACKNum = AN

otherwise

Clear Timers(LastACKed+1 to NextPktToSend-1)NextPktToSend = LastACKed+1

Timer expires

GBN Receiver Activity Diagramstart

Set NextPktToRec = 0Clear ReceiverBufferClear ReceivedPktsReceiverBase = 0

ReceivedPkts[NextPktToRec] == 1

NextPktToRec++Give ReceiverBuffer[NextPktToRec] to app

Send ACK with ACKNum = NextPktToRec - 1

Place Pkt in ReceiverBuffer[SeqNum]ReceivedPkts[SeqNum]=1

wait

otherwise

GBN sender extended FSM

Wait start_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

GBN receiver extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

GBN in Action

sender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK=0Rec 1 give to app and Send ACK=1Rec 2 give to app and Send ACK=2

Rec 3 give to app and Send ACK=3

Rec 4 give to app and Send ACK=4

Rec 5 give to app and Send ACK=5

Rec 7 discard and Send ACK=5

Rec 8 discard and Send ACK=5

Rec 9 discard and Send ACK=5

Rec 10 discard and Send ACK=5

Rec 11 discard and Send ACK=5

Rec 12 discard and Send ACK=5

Rec 13 discard and Send ACK=5

Rec 6 give to app and Send ACK=6

Rec 7 give to app and Send ACK=7Rec 8 give to app and Send ACK=8

Rec 9 give to app and Send ACK=9

Optimal size of N in GBN

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order

delivery to upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart

timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed

pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]

send ACK(n) out-of-order buffer in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

Selective repeat in action

Summary of transport layer tools used so far

ACK and NACK Sequence numbers (and no NACK) Time out Sliding window

Optimal size = bandwidth delay product (if no other flows are using the network)

Cumulative ACK Buffer at the receiver is optional

Selective ACK Requires buffering at the receiver

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size

connection-oriented handshaking (exchange

of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver

reliable in-order byte steam

Pipelined and time-varying window size TCP congestion and

flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

It can be used as a pointer for placing the received data in the receiver buffer

ACKs seq of next byte

expected from other side

cumulative ACK

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

TCP vs UCP Connection oriented

Connections must be set up The state of the connection

can be determined Flowcongestion control

Limits congestion in the network and end hosts

Control how fast data can be sent

Larger Packet header Retransmits lost packets and

reports if packets were not successfully transmitted

Check sum for error detection

Connectionless Connections does not need to

be set-up The state of the connection is

unknown No flowcongestion control

Could cause excessive congestion and unfair usage

Data can be sent exactly when it needs to be

Low overhead No feedback provided as to

whether packets were successfully transmitted

Check sum for error detection

Applications and Transport Protocols

Smtpmail TCP telnet TCP http TCP ftp TCP NFS UDP or TCP (why udp I do not know) Multimedia streaming UDP or TCP Voice over ip ndash UDP Routing ndashUDP its own or TCP DNS -UDP

Multiplexing with ports

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport layer packet headers always contain source and destination portIP headers have source and destination IPs

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

UDP User Datagram Protocol [RFC 768]

ldquono frillsrdquo ldquobare bonesrdquo Internet transport protocol

ldquobest effortrdquo service UDP segments may be lost delivered out of order

to app connectionless

no handshaking between UDP sender receiver

each UDP segment handled independently of others

Why is there a UDP no connection

establishment (which can add delay)

simple no connection state at sender receiver

small segment header no congestion control

UDP can blast away as fast as desired

UDP more

often used for streaming multimedia apps loss tolerant rate sensitive

other UDP uses DNS SNMP

reliable transfer over UDP add reliability at application layer application-specific

error recovery

source port dest port

32 bits

Applicationdata

(message)

UDP segment format

length checksumLength in

bytes of UDPsegmentincluding

header

UDP checksum

Sender treat segment contents

as sequence of 16-bit integers

checksum addition (1rsquos complement sum) of segment contents

sender puts checksum value into UDP checksum field

Receiver compute checksum of

received segment check if computed checksum

equals checksum field value NO - error detected YES - no error detected

But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

Internet Checksum Example Note

When adding numbers a carryout from the most significant bit needs to be added to the result

Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Reliable data transfer

Principles of Reliable data transfer

Principles of Reliable data transfer

Reliable data transfer getting started

sendside

receiveside

rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer

udt_send() called by rdtto transfer packet over unreliable channel to

receiver

rdt_rcv() called when packet arrives on rcv-side of channel

deliver_data() called by rdt to deliver data to

upper

Reliable data transfer getting startedWersquoll incrementally develop sender receiver

sides of reliable data transfer protocol (rdt) consider only unidirectional data transfer

but control info will flow on both directions

use finite state machines (FSM) to specify sender receiver

state1

state2

event causing state transitionactions taken on state transition

state when in this ldquostaterdquo next state

uniquely determined by next event

eventactions

Rdt10 reliable transfer over a reliable channel

underlying channel perfectly reliable no bit errors no loss of packets

separate FSMs for sender receiver sender sends data into underlying channel receiver read data from underlying channel

Rdt10 reliable transfer over a reliable channel

underlying channel perfectly reliable no bit errors no loss of packets

separate FSMs for sender receiver sender sends data into underlying channel receiver read data from underlying channel

packet = make_pkt(data)udt_send(packet)

Wait for call from above

rdt_send(data)

sender

extract (packetdata)deliver_data(data)

Wait for call from

below

rdt_rcv(packet)

receiver

Rdt20 channel with bit errors

underlying channel may flip bits in packets checksum to detect bit errors

the question how to recover from errors acknowledgements (ACKs) receiver explicitly tells

sender that pkt received OK negative acknowledgements (NAKs) receiver

explicitly tells sender that pkt had errors sender retransmits pkt on receipt of NAK

new mechanisms in rdt20 (beyond rdt10) error detection receiver feedback control msgs (ACKNAK) rcvr-

gtsender

rdt20 FSM specification

rdt20 FSM specification

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

belowsender

receiverrdt_send(data)

rdt20 has a fatal flaw

What happens if ACKNAK corrupted

sender doesnrsquot know what happened at receiver

canrsquot just retransmit possible duplicate

Handling duplicates sender retransmits current

pkt if ACKNAK garbled sender adds sequence

number to each pkt receiver discards (doesnrsquot

deliver up) duplicate pkt

Sender sends one packet then waits for receiver response

stop and wait

rdt21 sender handles garbled ACKNAKs

rdt21 receiver handles garbled ACKNAKs

rdt21 sender handles garbled ACKNAKs

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

Wait for ACK or NAK 0 udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

Wait for call 1 from

above

Wait for ACK or NAK 1

rdt21 receiver handles garbled ACKNAKs

Wait for 0 from below

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt21 discussion

Sender seq added to pkt two seq rsquos (01)

will suffice Why must check if

received ACKNAK corrupted

twice as many states state must

ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver must check if

received packet is duplicate state indicates

whether 0 or 1 is expected pkt seq

note receiver can not know if its last ACKNAK received OK at sender

rdt22 a NAK-free protocol

same functionality as rdt21 using ACKs only instead of NAK receiver sends ACK for last pkt

received OK receiver must explicitly include seq of pkt being

ACKed

duplicate ACK at sender results in same action as NAK retransmit current pkt

rdt22 sender receiver fragments

rdt22 sender receiver fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

receiver FSMfragment

What happens if a pkt is duplicated

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs) checksum seq

ACKs retransmissions will be of help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

retransmits if no ACK received in this time

if pkt (or ACK) just delayed (not lost) retransmission will be

duplicate but use of seq rsquos already handles this

receiver must specify seq of pkt being ACKed

requires countdown timer

rdt30 sender

rdt30 sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

rdt_rcv(rcvpkt)

resend pkt1

rdt30 in action

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1

rec ack0

rec pkt1

send ack1rec ack1

send pkt1

rec pkt1

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1rec ack0

send pkt2

rec pkt2

TO

rec pkt1send ack1

rec ack1

rdt30 in action

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1rec pkt1

TO

send ack1

send pkt2rec ack1

send ack1

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1

rec pkt1

TO

send ack1

send pkt

rec ack1

send ack1

send pkt2

rec pkt2rec ack1

send no pkt (dupACK) send ack2rec ack2

send pkt2

Performance of rdt30 rdt30 works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit packet and 100bit ACK

What is the total delaybull Data transmission delay

ndash 8000109 = 810-6

bull ACK Transmission delayndash 100109 = 10-7 sec

bull Total Delay ndash 215ms + 008 + 0001=300081ms

Utilization Time transmitting total time 008 300081 = 000027

This is one pkt every 30msec or 33 kBsec over a 1 Gbps link

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Pipelining Protocols

Go-back-N big picture Sender can have up

to N unacked packets in pipeline

Rcvr only sends cumulative acks Doesnrsquot ack packet if

therersquos a gap Sender has timer for

oldest unacked packet If timer expires

retransmit all unacked packets

Selective Repeat big pic

Sender can have up to N unacked packets in pipeline

Rcvr acks individual packets

Sender maintains timer for each unacked packet When timer expires

retransmit only unack packet

Selective repeat big picture

Sender can have up to N unacked packets in pipeline

Receiver acks individual packets Sender maintains timer for each

unacked packet When timer expires retransmit only unack

packet

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

Go-Back-NPkt that could be sent unACKed pkt

Unused pktACKed pkt

send pkt

window

1 unACKed pkts

startwindowN=12

0 unACKed pkts

pkts

send pkts

windowN unACKed pkts

Next pkt to be sent

ACK arrives

windowN=12

Send pkt

N unACKed pkts

windowN-1 unACKed pkts

Sliding window

State of pkts

Go-Back-N Pkt that could be sent unACKed pkt

Unused pktACKed pkt

window

N unACKed pkts

window

N-1 unACKed pkts

ACK arrives

Send pkt

N unACKed pkts

window

N unACKed pkts

windowNo ACK arrives hellip timeout

0 unACKed pkts

window

GBN sender extended Activity Diagram

GBN Receiver Activity Diagram

GBN sender extended Activity DiagramWaiting for file

Set NSet NextPktToSend=0

Set LastACKed=-1

NextPktToSend ndash LastACKedltN

Send pkt[NextPktToSend] with SeqNum= NextPktToSend NextPktToSend++

Set Timer(NextPktToSend) = Now + TO

Wait

Clear Timers(LastACKed+1 to AN)LastACKed = AN

ACK arrived with ACKNum = AN

otherwise

Clear Timers(LastACKed+1 to NextPktToSend-1)NextPktToSend = LastACKed+1

Timer expires

GBN Receiver Activity Diagramstart

Set NextPktToRec = 0Clear ReceiverBufferClear ReceivedPktsReceiverBase = 0

ReceivedPkts[NextPktToRec] == 1

NextPktToRec++Give ReceiverBuffer[NextPktToRec] to app

Send ACK with ACKNum = NextPktToRec - 1

Place Pkt in ReceiverBuffer[SeqNum]ReceivedPkts[SeqNum]=1

wait

otherwise

GBN sender extended FSM

Wait start_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

GBN receiver extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

GBN in Action

sender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK=0Rec 1 give to app and Send ACK=1Rec 2 give to app and Send ACK=2

Rec 3 give to app and Send ACK=3

Rec 4 give to app and Send ACK=4

Rec 5 give to app and Send ACK=5

Rec 7 discard and Send ACK=5

Rec 8 discard and Send ACK=5

Rec 9 discard and Send ACK=5

Rec 10 discard and Send ACK=5

Rec 11 discard and Send ACK=5

Rec 12 discard and Send ACK=5

Rec 13 discard and Send ACK=5

Rec 6 give to app and Send ACK=6

Rec 7 give to app and Send ACK=7Rec 8 give to app and Send ACK=8

Rec 9 give to app and Send ACK=9

Optimal size of N in GBN

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order

delivery to upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart

timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed

pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]

send ACK(n) out-of-order buffer in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

Selective repeat in action

Summary of transport layer tools used so far

ACK and NACK Sequence numbers (and no NACK) Time out Sliding window

Optimal size = bandwidth delay product (if no other flows are using the network)

Cumulative ACK Buffer at the receiver is optional

Selective ACK Requires buffering at the receiver

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size

connection-oriented handshaking (exchange

of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver

reliable in-order byte steam

Pipelined and time-varying window size TCP congestion and

flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

It can be used as a pointer for placing the received data in the receiver buffer

ACKs seq of next byte

expected from other side

cumulative ACK

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Applications and Transport Protocols

Smtpmail TCP telnet TCP http TCP ftp TCP NFS UDP or TCP (why udp I do not know) Multimedia streaming UDP or TCP Voice over ip ndash UDP Routing ndashUDP its own or TCP DNS -UDP

Multiplexing with ports

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport layer packet headers always contain source and destination portIP headers have source and destination IPs

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

UDP User Datagram Protocol [RFC 768]

ldquono frillsrdquo ldquobare bonesrdquo Internet transport protocol

ldquobest effortrdquo service UDP segments may be lost delivered out of order

to app connectionless

no handshaking between UDP sender receiver

each UDP segment handled independently of others

Why is there a UDP no connection

establishment (which can add delay)

simple no connection state at sender receiver

small segment header no congestion control

UDP can blast away as fast as desired

UDP more

often used for streaming multimedia apps loss tolerant rate sensitive

other UDP uses DNS SNMP

reliable transfer over UDP add reliability at application layer application-specific

error recovery

source port dest port

32 bits

Applicationdata

(message)

UDP segment format

length checksumLength in

bytes of UDPsegmentincluding

header

UDP checksum

Sender treat segment contents

as sequence of 16-bit integers

checksum addition (1rsquos complement sum) of segment contents

sender puts checksum value into UDP checksum field

Receiver compute checksum of

received segment check if computed checksum

equals checksum field value NO - error detected YES - no error detected

But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

Internet Checksum Example Note

When adding numbers a carryout from the most significant bit needs to be added to the result

Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Reliable data transfer

Principles of Reliable data transfer

Principles of Reliable data transfer

Reliable data transfer getting started

sendside

receiveside

rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer

udt_send() called by rdtto transfer packet over unreliable channel to

receiver

rdt_rcv() called when packet arrives on rcv-side of channel

deliver_data() called by rdt to deliver data to

upper

Reliable data transfer getting startedWersquoll incrementally develop sender receiver

sides of reliable data transfer protocol (rdt) consider only unidirectional data transfer

but control info will flow on both directions

use finite state machines (FSM) to specify sender receiver

state1

state2

event causing state transitionactions taken on state transition

state when in this ldquostaterdquo next state

uniquely determined by next event

eventactions

Rdt10 reliable transfer over a reliable channel

underlying channel perfectly reliable no bit errors no loss of packets

separate FSMs for sender receiver sender sends data into underlying channel receiver read data from underlying channel

Rdt10 reliable transfer over a reliable channel

underlying channel perfectly reliable no bit errors no loss of packets

separate FSMs for sender receiver sender sends data into underlying channel receiver read data from underlying channel

packet = make_pkt(data)udt_send(packet)

Wait for call from above

rdt_send(data)

sender

extract (packetdata)deliver_data(data)

Wait for call from

below

rdt_rcv(packet)

receiver

Rdt20 channel with bit errors

underlying channel may flip bits in packets checksum to detect bit errors

the question how to recover from errors acknowledgements (ACKs) receiver explicitly tells

sender that pkt received OK negative acknowledgements (NAKs) receiver

explicitly tells sender that pkt had errors sender retransmits pkt on receipt of NAK

new mechanisms in rdt20 (beyond rdt10) error detection receiver feedback control msgs (ACKNAK) rcvr-

gtsender

rdt20 FSM specification

rdt20 FSM specification

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

belowsender

receiverrdt_send(data)

rdt20 has a fatal flaw

What happens if ACKNAK corrupted

sender doesnrsquot know what happened at receiver

canrsquot just retransmit possible duplicate

Handling duplicates sender retransmits current

pkt if ACKNAK garbled sender adds sequence

number to each pkt receiver discards (doesnrsquot

deliver up) duplicate pkt

Sender sends one packet then waits for receiver response

stop and wait

rdt21 sender handles garbled ACKNAKs

rdt21 receiver handles garbled ACKNAKs

rdt21 sender handles garbled ACKNAKs

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

Wait for ACK or NAK 0 udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

Wait for call 1 from

above

Wait for ACK or NAK 1

rdt21 receiver handles garbled ACKNAKs

Wait for 0 from below

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt21 discussion

Sender seq added to pkt two seq rsquos (01)

will suffice Why must check if

received ACKNAK corrupted

twice as many states state must

ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver must check if

received packet is duplicate state indicates

whether 0 or 1 is expected pkt seq

note receiver can not know if its last ACKNAK received OK at sender

rdt22 a NAK-free protocol

same functionality as rdt21 using ACKs only instead of NAK receiver sends ACK for last pkt

received OK receiver must explicitly include seq of pkt being

ACKed

duplicate ACK at sender results in same action as NAK retransmit current pkt

rdt22 sender receiver fragments

rdt22 sender receiver fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

receiver FSMfragment

What happens if a pkt is duplicated

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs) checksum seq

ACKs retransmissions will be of help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

retransmits if no ACK received in this time

if pkt (or ACK) just delayed (not lost) retransmission will be

duplicate but use of seq rsquos already handles this

receiver must specify seq of pkt being ACKed

requires countdown timer

rdt30 sender

rdt30 sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

rdt_rcv(rcvpkt)

resend pkt1

rdt30 in action

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1

rec ack0

rec pkt1

send ack1rec ack1

send pkt1

rec pkt1

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1rec ack0

send pkt2

rec pkt2

TO

rec pkt1send ack1

rec ack1

rdt30 in action

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1rec pkt1

TO

send ack1

send pkt2rec ack1

send ack1

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1

rec pkt1

TO

send ack1

send pkt

rec ack1

send ack1

send pkt2

rec pkt2rec ack1

send no pkt (dupACK) send ack2rec ack2

send pkt2

Performance of rdt30 rdt30 works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit packet and 100bit ACK

What is the total delaybull Data transmission delay

ndash 8000109 = 810-6

bull ACK Transmission delayndash 100109 = 10-7 sec

bull Total Delay ndash 215ms + 008 + 0001=300081ms

Utilization Time transmitting total time 008 300081 = 000027

This is one pkt every 30msec or 33 kBsec over a 1 Gbps link

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Pipelining Protocols

Go-back-N big picture Sender can have up

to N unacked packets in pipeline

Rcvr only sends cumulative acks Doesnrsquot ack packet if

therersquos a gap Sender has timer for

oldest unacked packet If timer expires

retransmit all unacked packets

Selective Repeat big pic

Sender can have up to N unacked packets in pipeline

Rcvr acks individual packets

Sender maintains timer for each unacked packet When timer expires

retransmit only unack packet

Selective repeat big picture

Sender can have up to N unacked packets in pipeline

Receiver acks individual packets Sender maintains timer for each

unacked packet When timer expires retransmit only unack

packet

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

Go-Back-NPkt that could be sent unACKed pkt

Unused pktACKed pkt

send pkt

window

1 unACKed pkts

startwindowN=12

0 unACKed pkts

pkts

send pkts

windowN unACKed pkts

Next pkt to be sent

ACK arrives

windowN=12

Send pkt

N unACKed pkts

windowN-1 unACKed pkts

Sliding window

State of pkts

Go-Back-N Pkt that could be sent unACKed pkt

Unused pktACKed pkt

window

N unACKed pkts

window

N-1 unACKed pkts

ACK arrives

Send pkt

N unACKed pkts

window

N unACKed pkts

windowNo ACK arrives hellip timeout

0 unACKed pkts

window

GBN sender extended Activity Diagram

GBN Receiver Activity Diagram

GBN sender extended Activity DiagramWaiting for file

Set NSet NextPktToSend=0

Set LastACKed=-1

NextPktToSend ndash LastACKedltN

Send pkt[NextPktToSend] with SeqNum= NextPktToSend NextPktToSend++

Set Timer(NextPktToSend) = Now + TO

Wait

Clear Timers(LastACKed+1 to AN)LastACKed = AN

ACK arrived with ACKNum = AN

otherwise

Clear Timers(LastACKed+1 to NextPktToSend-1)NextPktToSend = LastACKed+1

Timer expires

GBN Receiver Activity Diagramstart

Set NextPktToRec = 0Clear ReceiverBufferClear ReceivedPktsReceiverBase = 0

ReceivedPkts[NextPktToRec] == 1

NextPktToRec++Give ReceiverBuffer[NextPktToRec] to app

Send ACK with ACKNum = NextPktToRec - 1

Place Pkt in ReceiverBuffer[SeqNum]ReceivedPkts[SeqNum]=1

wait

otherwise

GBN sender extended FSM

Wait start_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

GBN receiver extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

GBN in Action

sender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK=0Rec 1 give to app and Send ACK=1Rec 2 give to app and Send ACK=2

Rec 3 give to app and Send ACK=3

Rec 4 give to app and Send ACK=4

Rec 5 give to app and Send ACK=5

Rec 7 discard and Send ACK=5

Rec 8 discard and Send ACK=5

Rec 9 discard and Send ACK=5

Rec 10 discard and Send ACK=5

Rec 11 discard and Send ACK=5

Rec 12 discard and Send ACK=5

Rec 13 discard and Send ACK=5

Rec 6 give to app and Send ACK=6

Rec 7 give to app and Send ACK=7Rec 8 give to app and Send ACK=8

Rec 9 give to app and Send ACK=9

Optimal size of N in GBN

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order

delivery to upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart

timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed

pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]

send ACK(n) out-of-order buffer in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

Selective repeat in action

Summary of transport layer tools used so far

ACK and NACK Sequence numbers (and no NACK) Time out Sliding window

Optimal size = bandwidth delay product (if no other flows are using the network)

Cumulative ACK Buffer at the receiver is optional

Selective ACK Requires buffering at the receiver

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size

connection-oriented handshaking (exchange

of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver

reliable in-order byte steam

Pipelined and time-varying window size TCP congestion and

flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

It can be used as a pointer for placing the received data in the receiver buffer

ACKs seq of next byte

expected from other side

cumulative ACK

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Multiplexing with ports

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport layer packet headers always contain source and destination portIP headers have source and destination IPs

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

UDP User Datagram Protocol [RFC 768]

ldquono frillsrdquo ldquobare bonesrdquo Internet transport protocol

ldquobest effortrdquo service UDP segments may be lost delivered out of order

to app connectionless

no handshaking between UDP sender receiver

each UDP segment handled independently of others

Why is there a UDP no connection

establishment (which can add delay)

simple no connection state at sender receiver

small segment header no congestion control

UDP can blast away as fast as desired

UDP more

often used for streaming multimedia apps loss tolerant rate sensitive

other UDP uses DNS SNMP

reliable transfer over UDP add reliability at application layer application-specific

error recovery

source port dest port

32 bits

Applicationdata

(message)

UDP segment format

length checksumLength in

bytes of UDPsegmentincluding

header

UDP checksum

Sender treat segment contents

as sequence of 16-bit integers

checksum addition (1rsquos complement sum) of segment contents

sender puts checksum value into UDP checksum field

Receiver compute checksum of

received segment check if computed checksum

equals checksum field value NO - error detected YES - no error detected

But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

Internet Checksum Example Note

When adding numbers a carryout from the most significant bit needs to be added to the result

Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Reliable data transfer

Principles of Reliable data transfer

Principles of Reliable data transfer

Reliable data transfer getting started

sendside

receiveside

rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer

udt_send() called by rdtto transfer packet over unreliable channel to

receiver

rdt_rcv() called when packet arrives on rcv-side of channel

deliver_data() called by rdt to deliver data to

upper

Reliable data transfer getting startedWersquoll incrementally develop sender receiver

sides of reliable data transfer protocol (rdt) consider only unidirectional data transfer

but control info will flow on both directions

use finite state machines (FSM) to specify sender receiver

state1

state2

event causing state transitionactions taken on state transition

state when in this ldquostaterdquo next state

uniquely determined by next event

eventactions

Rdt10 reliable transfer over a reliable channel

underlying channel perfectly reliable no bit errors no loss of packets

separate FSMs for sender receiver sender sends data into underlying channel receiver read data from underlying channel

Rdt10 reliable transfer over a reliable channel

underlying channel perfectly reliable no bit errors no loss of packets

separate FSMs for sender receiver sender sends data into underlying channel receiver read data from underlying channel

packet = make_pkt(data)udt_send(packet)

Wait for call from above

rdt_send(data)

sender

extract (packetdata)deliver_data(data)

Wait for call from

below

rdt_rcv(packet)

receiver

Rdt20 channel with bit errors

underlying channel may flip bits in packets checksum to detect bit errors

the question how to recover from errors acknowledgements (ACKs) receiver explicitly tells

sender that pkt received OK negative acknowledgements (NAKs) receiver

explicitly tells sender that pkt had errors sender retransmits pkt on receipt of NAK

new mechanisms in rdt20 (beyond rdt10) error detection receiver feedback control msgs (ACKNAK) rcvr-

gtsender

rdt20 FSM specification

rdt20 FSM specification

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

belowsender

receiverrdt_send(data)

rdt20 has a fatal flaw

What happens if ACKNAK corrupted

sender doesnrsquot know what happened at receiver

canrsquot just retransmit possible duplicate

Handling duplicates sender retransmits current

pkt if ACKNAK garbled sender adds sequence

number to each pkt receiver discards (doesnrsquot

deliver up) duplicate pkt

Sender sends one packet then waits for receiver response

stop and wait

rdt21 sender handles garbled ACKNAKs

rdt21 receiver handles garbled ACKNAKs

rdt21 sender handles garbled ACKNAKs

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

Wait for ACK or NAK 0 udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

Wait for call 1 from

above

Wait for ACK or NAK 1

rdt21 receiver handles garbled ACKNAKs

Wait for 0 from below

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt21 discussion

Sender seq added to pkt two seq rsquos (01)

will suffice Why must check if

received ACKNAK corrupted

twice as many states state must

ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver must check if

received packet is duplicate state indicates

whether 0 or 1 is expected pkt seq

note receiver can not know if its last ACKNAK received OK at sender

rdt22 a NAK-free protocol

same functionality as rdt21 using ACKs only instead of NAK receiver sends ACK for last pkt

received OK receiver must explicitly include seq of pkt being

ACKed

duplicate ACK at sender results in same action as NAK retransmit current pkt

rdt22 sender receiver fragments

rdt22 sender receiver fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

receiver FSMfragment

What happens if a pkt is duplicated

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs) checksum seq

ACKs retransmissions will be of help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

retransmits if no ACK received in this time

if pkt (or ACK) just delayed (not lost) retransmission will be

duplicate but use of seq rsquos already handles this

receiver must specify seq of pkt being ACKed

requires countdown timer

rdt30 sender

rdt30 sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

rdt_rcv(rcvpkt)

resend pkt1

rdt30 in action

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1

rec ack0

rec pkt1

send ack1rec ack1

send pkt1

rec pkt1

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1rec ack0

send pkt2

rec pkt2

TO

rec pkt1send ack1

rec ack1

rdt30 in action

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1rec pkt1

TO

send ack1

send pkt2rec ack1

send ack1

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1

rec pkt1

TO

send ack1

send pkt

rec ack1

send ack1

send pkt2

rec pkt2rec ack1

send no pkt (dupACK) send ack2rec ack2

send pkt2

Performance of rdt30 rdt30 works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit packet and 100bit ACK

What is the total delaybull Data transmission delay

ndash 8000109 = 810-6

bull ACK Transmission delayndash 100109 = 10-7 sec

bull Total Delay ndash 215ms + 008 + 0001=300081ms

Utilization Time transmitting total time 008 300081 = 000027

This is one pkt every 30msec or 33 kBsec over a 1 Gbps link

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Pipelining Protocols

Go-back-N big picture Sender can have up

to N unacked packets in pipeline

Rcvr only sends cumulative acks Doesnrsquot ack packet if

therersquos a gap Sender has timer for

oldest unacked packet If timer expires

retransmit all unacked packets

Selective Repeat big pic

Sender can have up to N unacked packets in pipeline

Rcvr acks individual packets

Sender maintains timer for each unacked packet When timer expires

retransmit only unack packet

Selective repeat big picture

Sender can have up to N unacked packets in pipeline

Receiver acks individual packets Sender maintains timer for each

unacked packet When timer expires retransmit only unack

packet

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

Go-Back-NPkt that could be sent unACKed pkt

Unused pktACKed pkt

send pkt

window

1 unACKed pkts

startwindowN=12

0 unACKed pkts

pkts

send pkts

windowN unACKed pkts

Next pkt to be sent

ACK arrives

windowN=12

Send pkt

N unACKed pkts

windowN-1 unACKed pkts

Sliding window

State of pkts

Go-Back-N Pkt that could be sent unACKed pkt

Unused pktACKed pkt

window

N unACKed pkts

window

N-1 unACKed pkts

ACK arrives

Send pkt

N unACKed pkts

window

N unACKed pkts

windowNo ACK arrives hellip timeout

0 unACKed pkts

window

GBN sender extended Activity Diagram

GBN Receiver Activity Diagram

GBN sender extended Activity DiagramWaiting for file

Set NSet NextPktToSend=0

Set LastACKed=-1

NextPktToSend ndash LastACKedltN

Send pkt[NextPktToSend] with SeqNum= NextPktToSend NextPktToSend++

Set Timer(NextPktToSend) = Now + TO

Wait

Clear Timers(LastACKed+1 to AN)LastACKed = AN

ACK arrived with ACKNum = AN

otherwise

Clear Timers(LastACKed+1 to NextPktToSend-1)NextPktToSend = LastACKed+1

Timer expires

GBN Receiver Activity Diagramstart

Set NextPktToRec = 0Clear ReceiverBufferClear ReceivedPktsReceiverBase = 0

ReceivedPkts[NextPktToRec] == 1

NextPktToRec++Give ReceiverBuffer[NextPktToRec] to app

Send ACK with ACKNum = NextPktToRec - 1

Place Pkt in ReceiverBuffer[SeqNum]ReceivedPkts[SeqNum]=1

wait

otherwise

GBN sender extended FSM

Wait start_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

GBN receiver extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

GBN in Action

sender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK=0Rec 1 give to app and Send ACK=1Rec 2 give to app and Send ACK=2

Rec 3 give to app and Send ACK=3

Rec 4 give to app and Send ACK=4

Rec 5 give to app and Send ACK=5

Rec 7 discard and Send ACK=5

Rec 8 discard and Send ACK=5

Rec 9 discard and Send ACK=5

Rec 10 discard and Send ACK=5

Rec 11 discard and Send ACK=5

Rec 12 discard and Send ACK=5

Rec 13 discard and Send ACK=5

Rec 6 give to app and Send ACK=6

Rec 7 give to app and Send ACK=7Rec 8 give to app and Send ACK=8

Rec 9 give to app and Send ACK=9

Optimal size of N in GBN

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order

delivery to upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart

timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed

pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]

send ACK(n) out-of-order buffer in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

Selective repeat in action

Summary of transport layer tools used so far

ACK and NACK Sequence numbers (and no NACK) Time out Sliding window

Optimal size = bandwidth delay product (if no other flows are using the network)

Cumulative ACK Buffer at the receiver is optional

Selective ACK Requires buffering at the receiver

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size

connection-oriented handshaking (exchange

of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver

reliable in-order byte steam

Pipelined and time-varying window size TCP congestion and

flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

It can be used as a pointer for placing the received data in the receiver buffer

ACKs seq of next byte

expected from other side

cumulative ACK

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

UDP User Datagram Protocol [RFC 768]

ldquono frillsrdquo ldquobare bonesrdquo Internet transport protocol

ldquobest effortrdquo service UDP segments may be lost delivered out of order

to app connectionless

no handshaking between UDP sender receiver

each UDP segment handled independently of others

Why is there a UDP no connection

establishment (which can add delay)

simple no connection state at sender receiver

small segment header no congestion control

UDP can blast away as fast as desired

UDP more

often used for streaming multimedia apps loss tolerant rate sensitive

other UDP uses DNS SNMP

reliable transfer over UDP add reliability at application layer application-specific

error recovery

source port dest port

32 bits

Applicationdata

(message)

UDP segment format

length checksumLength in

bytes of UDPsegmentincluding

header

UDP checksum

Sender treat segment contents

as sequence of 16-bit integers

checksum addition (1rsquos complement sum) of segment contents

sender puts checksum value into UDP checksum field

Receiver compute checksum of

received segment check if computed checksum

equals checksum field value NO - error detected YES - no error detected

But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

Internet Checksum Example Note

When adding numbers a carryout from the most significant bit needs to be added to the result

Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Reliable data transfer

Principles of Reliable data transfer

Principles of Reliable data transfer

Reliable data transfer getting started

sendside

receiveside

rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer

udt_send() called by rdtto transfer packet over unreliable channel to

receiver

rdt_rcv() called when packet arrives on rcv-side of channel

deliver_data() called by rdt to deliver data to

upper

Reliable data transfer getting startedWersquoll incrementally develop sender receiver

sides of reliable data transfer protocol (rdt) consider only unidirectional data transfer

but control info will flow on both directions

use finite state machines (FSM) to specify sender receiver

state1

state2

event causing state transitionactions taken on state transition

state when in this ldquostaterdquo next state

uniquely determined by next event

eventactions

Rdt10 reliable transfer over a reliable channel

underlying channel perfectly reliable no bit errors no loss of packets

separate FSMs for sender receiver sender sends data into underlying channel receiver read data from underlying channel

Rdt10 reliable transfer over a reliable channel

underlying channel perfectly reliable no bit errors no loss of packets

separate FSMs for sender receiver sender sends data into underlying channel receiver read data from underlying channel

packet = make_pkt(data)udt_send(packet)

Wait for call from above

rdt_send(data)

sender

extract (packetdata)deliver_data(data)

Wait for call from

below

rdt_rcv(packet)

receiver

Rdt20 channel with bit errors

underlying channel may flip bits in packets checksum to detect bit errors

the question how to recover from errors acknowledgements (ACKs) receiver explicitly tells

sender that pkt received OK negative acknowledgements (NAKs) receiver

explicitly tells sender that pkt had errors sender retransmits pkt on receipt of NAK

new mechanisms in rdt20 (beyond rdt10) error detection receiver feedback control msgs (ACKNAK) rcvr-

gtsender

rdt20 FSM specification

rdt20 FSM specification

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

belowsender

receiverrdt_send(data)

rdt20 has a fatal flaw

What happens if ACKNAK corrupted

sender doesnrsquot know what happened at receiver

canrsquot just retransmit possible duplicate

Handling duplicates sender retransmits current

pkt if ACKNAK garbled sender adds sequence

number to each pkt receiver discards (doesnrsquot

deliver up) duplicate pkt

Sender sends one packet then waits for receiver response

stop and wait

rdt21 sender handles garbled ACKNAKs

rdt21 receiver handles garbled ACKNAKs

rdt21 sender handles garbled ACKNAKs

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

Wait for ACK or NAK 0 udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

Wait for call 1 from

above

Wait for ACK or NAK 1

rdt21 receiver handles garbled ACKNAKs

Wait for 0 from below

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt21 discussion

Sender seq added to pkt two seq rsquos (01)

will suffice Why must check if

received ACKNAK corrupted

twice as many states state must

ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver must check if

received packet is duplicate state indicates

whether 0 or 1 is expected pkt seq

note receiver can not know if its last ACKNAK received OK at sender

rdt22 a NAK-free protocol

same functionality as rdt21 using ACKs only instead of NAK receiver sends ACK for last pkt

received OK receiver must explicitly include seq of pkt being

ACKed

duplicate ACK at sender results in same action as NAK retransmit current pkt

rdt22 sender receiver fragments

rdt22 sender receiver fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

receiver FSMfragment

What happens if a pkt is duplicated

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs) checksum seq

ACKs retransmissions will be of help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

retransmits if no ACK received in this time

if pkt (or ACK) just delayed (not lost) retransmission will be

duplicate but use of seq rsquos already handles this

receiver must specify seq of pkt being ACKed

requires countdown timer

rdt30 sender

rdt30 sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

rdt_rcv(rcvpkt)

resend pkt1

rdt30 in action

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1

rec ack0

rec pkt1

send ack1rec ack1

send pkt1

rec pkt1

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1rec ack0

send pkt2

rec pkt2

TO

rec pkt1send ack1

rec ack1

rdt30 in action

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1rec pkt1

TO

send ack1

send pkt2rec ack1

send ack1

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1

rec pkt1

TO

send ack1

send pkt

rec ack1

send ack1

send pkt2

rec pkt2rec ack1

send no pkt (dupACK) send ack2rec ack2

send pkt2

Performance of rdt30 rdt30 works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit packet and 100bit ACK

What is the total delaybull Data transmission delay

ndash 8000109 = 810-6

bull ACK Transmission delayndash 100109 = 10-7 sec

bull Total Delay ndash 215ms + 008 + 0001=300081ms

Utilization Time transmitting total time 008 300081 = 000027

This is one pkt every 30msec or 33 kBsec over a 1 Gbps link

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Pipelining Protocols

Go-back-N big picture Sender can have up

to N unacked packets in pipeline

Rcvr only sends cumulative acks Doesnrsquot ack packet if

therersquos a gap Sender has timer for

oldest unacked packet If timer expires

retransmit all unacked packets

Selective Repeat big pic

Sender can have up to N unacked packets in pipeline

Rcvr acks individual packets

Sender maintains timer for each unacked packet When timer expires

retransmit only unack packet

Selective repeat big picture

Sender can have up to N unacked packets in pipeline

Receiver acks individual packets Sender maintains timer for each

unacked packet When timer expires retransmit only unack

packet

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

Go-Back-NPkt that could be sent unACKed pkt

Unused pktACKed pkt

send pkt

window

1 unACKed pkts

startwindowN=12

0 unACKed pkts

pkts

send pkts

windowN unACKed pkts

Next pkt to be sent

ACK arrives

windowN=12

Send pkt

N unACKed pkts

windowN-1 unACKed pkts

Sliding window

State of pkts

Go-Back-N Pkt that could be sent unACKed pkt

Unused pktACKed pkt

window

N unACKed pkts

window

N-1 unACKed pkts

ACK arrives

Send pkt

N unACKed pkts

window

N unACKed pkts

windowNo ACK arrives hellip timeout

0 unACKed pkts

window

GBN sender extended Activity Diagram

GBN Receiver Activity Diagram

GBN sender extended Activity DiagramWaiting for file

Set NSet NextPktToSend=0

Set LastACKed=-1

NextPktToSend ndash LastACKedltN

Send pkt[NextPktToSend] with SeqNum= NextPktToSend NextPktToSend++

Set Timer(NextPktToSend) = Now + TO

Wait

Clear Timers(LastACKed+1 to AN)LastACKed = AN

ACK arrived with ACKNum = AN

otherwise

Clear Timers(LastACKed+1 to NextPktToSend-1)NextPktToSend = LastACKed+1

Timer expires

GBN Receiver Activity Diagramstart

Set NextPktToRec = 0Clear ReceiverBufferClear ReceivedPktsReceiverBase = 0

ReceivedPkts[NextPktToRec] == 1

NextPktToRec++Give ReceiverBuffer[NextPktToRec] to app

Send ACK with ACKNum = NextPktToRec - 1

Place Pkt in ReceiverBuffer[SeqNum]ReceivedPkts[SeqNum]=1

wait

otherwise

GBN sender extended FSM

Wait start_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

GBN receiver extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

GBN in Action

sender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK=0Rec 1 give to app and Send ACK=1Rec 2 give to app and Send ACK=2

Rec 3 give to app and Send ACK=3

Rec 4 give to app and Send ACK=4

Rec 5 give to app and Send ACK=5

Rec 7 discard and Send ACK=5

Rec 8 discard and Send ACK=5

Rec 9 discard and Send ACK=5

Rec 10 discard and Send ACK=5

Rec 11 discard and Send ACK=5

Rec 12 discard and Send ACK=5

Rec 13 discard and Send ACK=5

Rec 6 give to app and Send ACK=6

Rec 7 give to app and Send ACK=7Rec 8 give to app and Send ACK=8

Rec 9 give to app and Send ACK=9

Optimal size of N in GBN

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order

delivery to upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart

timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed

pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]

send ACK(n) out-of-order buffer in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

Selective repeat in action

Summary of transport layer tools used so far

ACK and NACK Sequence numbers (and no NACK) Time out Sliding window

Optimal size = bandwidth delay product (if no other flows are using the network)

Cumulative ACK Buffer at the receiver is optional

Selective ACK Requires buffering at the receiver

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size

connection-oriented handshaking (exchange

of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver

reliable in-order byte steam

Pipelined and time-varying window size TCP congestion and

flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

It can be used as a pointer for placing the received data in the receiver buffer

ACKs seq of next byte

expected from other side

cumulative ACK

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

UDP User Datagram Protocol [RFC 768]

ldquono frillsrdquo ldquobare bonesrdquo Internet transport protocol

ldquobest effortrdquo service UDP segments may be lost delivered out of order

to app connectionless

no handshaking between UDP sender receiver

each UDP segment handled independently of others

Why is there a UDP no connection

establishment (which can add delay)

simple no connection state at sender receiver

small segment header no congestion control

UDP can blast away as fast as desired

UDP more

often used for streaming multimedia apps loss tolerant rate sensitive

other UDP uses DNS SNMP

reliable transfer over UDP add reliability at application layer application-specific

error recovery

source port dest port

32 bits

Applicationdata

(message)

UDP segment format

length checksumLength in

bytes of UDPsegmentincluding

header

UDP checksum

Sender treat segment contents

as sequence of 16-bit integers

checksum addition (1rsquos complement sum) of segment contents

sender puts checksum value into UDP checksum field

Receiver compute checksum of

received segment check if computed checksum

equals checksum field value NO - error detected YES - no error detected

But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

Internet Checksum Example Note

When adding numbers a carryout from the most significant bit needs to be added to the result

Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Reliable data transfer

Principles of Reliable data transfer

Principles of Reliable data transfer

Reliable data transfer getting started

sendside

receiveside

rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer

udt_send() called by rdtto transfer packet over unreliable channel to

receiver

rdt_rcv() called when packet arrives on rcv-side of channel

deliver_data() called by rdt to deliver data to

upper

Reliable data transfer getting startedWersquoll incrementally develop sender receiver

sides of reliable data transfer protocol (rdt) consider only unidirectional data transfer

but control info will flow on both directions

use finite state machines (FSM) to specify sender receiver

state1

state2

event causing state transitionactions taken on state transition

state when in this ldquostaterdquo next state

uniquely determined by next event

eventactions

Rdt10 reliable transfer over a reliable channel

underlying channel perfectly reliable no bit errors no loss of packets

separate FSMs for sender receiver sender sends data into underlying channel receiver read data from underlying channel

Rdt10 reliable transfer over a reliable channel

underlying channel perfectly reliable no bit errors no loss of packets

separate FSMs for sender receiver sender sends data into underlying channel receiver read data from underlying channel

packet = make_pkt(data)udt_send(packet)

Wait for call from above

rdt_send(data)

sender

extract (packetdata)deliver_data(data)

Wait for call from

below

rdt_rcv(packet)

receiver

Rdt20 channel with bit errors

underlying channel may flip bits in packets checksum to detect bit errors

the question how to recover from errors acknowledgements (ACKs) receiver explicitly tells

sender that pkt received OK negative acknowledgements (NAKs) receiver

explicitly tells sender that pkt had errors sender retransmits pkt on receipt of NAK

new mechanisms in rdt20 (beyond rdt10) error detection receiver feedback control msgs (ACKNAK) rcvr-

gtsender

rdt20 FSM specification

rdt20 FSM specification

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

belowsender

receiverrdt_send(data)

rdt20 has a fatal flaw

What happens if ACKNAK corrupted

sender doesnrsquot know what happened at receiver

canrsquot just retransmit possible duplicate

Handling duplicates sender retransmits current

pkt if ACKNAK garbled sender adds sequence

number to each pkt receiver discards (doesnrsquot

deliver up) duplicate pkt

Sender sends one packet then waits for receiver response

stop and wait

rdt21 sender handles garbled ACKNAKs

rdt21 receiver handles garbled ACKNAKs

rdt21 sender handles garbled ACKNAKs

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

Wait for ACK or NAK 0 udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

Wait for call 1 from

above

Wait for ACK or NAK 1

rdt21 receiver handles garbled ACKNAKs

Wait for 0 from below

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt21 discussion

Sender seq added to pkt two seq rsquos (01)

will suffice Why must check if

received ACKNAK corrupted

twice as many states state must

ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver must check if

received packet is duplicate state indicates

whether 0 or 1 is expected pkt seq

note receiver can not know if its last ACKNAK received OK at sender

rdt22 a NAK-free protocol

same functionality as rdt21 using ACKs only instead of NAK receiver sends ACK for last pkt

received OK receiver must explicitly include seq of pkt being

ACKed

duplicate ACK at sender results in same action as NAK retransmit current pkt

rdt22 sender receiver fragments

rdt22 sender receiver fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

receiver FSMfragment

What happens if a pkt is duplicated

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs) checksum seq

ACKs retransmissions will be of help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

retransmits if no ACK received in this time

if pkt (or ACK) just delayed (not lost) retransmission will be

duplicate but use of seq rsquos already handles this

receiver must specify seq of pkt being ACKed

requires countdown timer

rdt30 sender

rdt30 sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

rdt_rcv(rcvpkt)

resend pkt1

rdt30 in action

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1

rec ack0

rec pkt1

send ack1rec ack1

send pkt1

rec pkt1

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1rec ack0

send pkt2

rec pkt2

TO

rec pkt1send ack1

rec ack1

rdt30 in action

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1rec pkt1

TO

send ack1

send pkt2rec ack1

send ack1

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1

rec pkt1

TO

send ack1

send pkt

rec ack1

send ack1

send pkt2

rec pkt2rec ack1

send no pkt (dupACK) send ack2rec ack2

send pkt2

Performance of rdt30 rdt30 works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit packet and 100bit ACK

What is the total delaybull Data transmission delay

ndash 8000109 = 810-6

bull ACK Transmission delayndash 100109 = 10-7 sec

bull Total Delay ndash 215ms + 008 + 0001=300081ms

Utilization Time transmitting total time 008 300081 = 000027

This is one pkt every 30msec or 33 kBsec over a 1 Gbps link

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Pipelining Protocols

Go-back-N big picture Sender can have up

to N unacked packets in pipeline

Rcvr only sends cumulative acks Doesnrsquot ack packet if

therersquos a gap Sender has timer for

oldest unacked packet If timer expires

retransmit all unacked packets

Selective Repeat big pic

Sender can have up to N unacked packets in pipeline

Rcvr acks individual packets

Sender maintains timer for each unacked packet When timer expires

retransmit only unack packet

Selective repeat big picture

Sender can have up to N unacked packets in pipeline

Receiver acks individual packets Sender maintains timer for each

unacked packet When timer expires retransmit only unack

packet

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

Go-Back-NPkt that could be sent unACKed pkt

Unused pktACKed pkt

send pkt

window

1 unACKed pkts

startwindowN=12

0 unACKed pkts

pkts

send pkts

windowN unACKed pkts

Next pkt to be sent

ACK arrives

windowN=12

Send pkt

N unACKed pkts

windowN-1 unACKed pkts

Sliding window

State of pkts

Go-Back-N Pkt that could be sent unACKed pkt

Unused pktACKed pkt

window

N unACKed pkts

window

N-1 unACKed pkts

ACK arrives

Send pkt

N unACKed pkts

window

N unACKed pkts

windowNo ACK arrives hellip timeout

0 unACKed pkts

window

GBN sender extended Activity Diagram

GBN Receiver Activity Diagram

GBN sender extended Activity DiagramWaiting for file

Set NSet NextPktToSend=0

Set LastACKed=-1

NextPktToSend ndash LastACKedltN

Send pkt[NextPktToSend] with SeqNum= NextPktToSend NextPktToSend++

Set Timer(NextPktToSend) = Now + TO

Wait

Clear Timers(LastACKed+1 to AN)LastACKed = AN

ACK arrived with ACKNum = AN

otherwise

Clear Timers(LastACKed+1 to NextPktToSend-1)NextPktToSend = LastACKed+1

Timer expires

GBN Receiver Activity Diagramstart

Set NextPktToRec = 0Clear ReceiverBufferClear ReceivedPktsReceiverBase = 0

ReceivedPkts[NextPktToRec] == 1

NextPktToRec++Give ReceiverBuffer[NextPktToRec] to app

Send ACK with ACKNum = NextPktToRec - 1

Place Pkt in ReceiverBuffer[SeqNum]ReceivedPkts[SeqNum]=1

wait

otherwise

GBN sender extended FSM

Wait start_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

GBN receiver extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

GBN in Action

sender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK=0Rec 1 give to app and Send ACK=1Rec 2 give to app and Send ACK=2

Rec 3 give to app and Send ACK=3

Rec 4 give to app and Send ACK=4

Rec 5 give to app and Send ACK=5

Rec 7 discard and Send ACK=5

Rec 8 discard and Send ACK=5

Rec 9 discard and Send ACK=5

Rec 10 discard and Send ACK=5

Rec 11 discard and Send ACK=5

Rec 12 discard and Send ACK=5

Rec 13 discard and Send ACK=5

Rec 6 give to app and Send ACK=6

Rec 7 give to app and Send ACK=7Rec 8 give to app and Send ACK=8

Rec 9 give to app and Send ACK=9

Optimal size of N in GBN

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order

delivery to upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart

timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed

pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]

send ACK(n) out-of-order buffer in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

Selective repeat in action

Summary of transport layer tools used so far

ACK and NACK Sequence numbers (and no NACK) Time out Sliding window

Optimal size = bandwidth delay product (if no other flows are using the network)

Cumulative ACK Buffer at the receiver is optional

Selective ACK Requires buffering at the receiver

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size

connection-oriented handshaking (exchange

of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver

reliable in-order byte steam

Pipelined and time-varying window size TCP congestion and

flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

It can be used as a pointer for placing the received data in the receiver buffer

ACKs seq of next byte

expected from other side

cumulative ACK

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

UDP more

often used for streaming multimedia apps loss tolerant rate sensitive

other UDP uses DNS SNMP

reliable transfer over UDP add reliability at application layer application-specific

error recovery

source port dest port

32 bits

Applicationdata

(message)

UDP segment format

length checksumLength in

bytes of UDPsegmentincluding

header

UDP checksum

Sender treat segment contents

as sequence of 16-bit integers

checksum addition (1rsquos complement sum) of segment contents

sender puts checksum value into UDP checksum field

Receiver compute checksum of

received segment check if computed checksum

equals checksum field value NO - error detected YES - no error detected

But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

Internet Checksum Example Note

When adding numbers a carryout from the most significant bit needs to be added to the result

Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Reliable data transfer

Principles of Reliable data transfer

Principles of Reliable data transfer

Reliable data transfer getting started

sendside

receiveside

rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer

udt_send() called by rdtto transfer packet over unreliable channel to

receiver

rdt_rcv() called when packet arrives on rcv-side of channel

deliver_data() called by rdt to deliver data to

upper

Reliable data transfer getting startedWersquoll incrementally develop sender receiver

sides of reliable data transfer protocol (rdt) consider only unidirectional data transfer

but control info will flow on both directions

use finite state machines (FSM) to specify sender receiver

state1

state2

event causing state transitionactions taken on state transition

state when in this ldquostaterdquo next state

uniquely determined by next event

eventactions

Rdt10 reliable transfer over a reliable channel

underlying channel perfectly reliable no bit errors no loss of packets

separate FSMs for sender receiver sender sends data into underlying channel receiver read data from underlying channel

Rdt10 reliable transfer over a reliable channel

underlying channel perfectly reliable no bit errors no loss of packets

separate FSMs for sender receiver sender sends data into underlying channel receiver read data from underlying channel

packet = make_pkt(data)udt_send(packet)

Wait for call from above

rdt_send(data)

sender

extract (packetdata)deliver_data(data)

Wait for call from

below

rdt_rcv(packet)

receiver

Rdt20 channel with bit errors

underlying channel may flip bits in packets checksum to detect bit errors

the question how to recover from errors acknowledgements (ACKs) receiver explicitly tells

sender that pkt received OK negative acknowledgements (NAKs) receiver

explicitly tells sender that pkt had errors sender retransmits pkt on receipt of NAK

new mechanisms in rdt20 (beyond rdt10) error detection receiver feedback control msgs (ACKNAK) rcvr-

gtsender

rdt20 FSM specification

rdt20 FSM specification

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

belowsender

receiverrdt_send(data)

rdt20 has a fatal flaw

What happens if ACKNAK corrupted

sender doesnrsquot know what happened at receiver

canrsquot just retransmit possible duplicate

Handling duplicates sender retransmits current

pkt if ACKNAK garbled sender adds sequence

number to each pkt receiver discards (doesnrsquot

deliver up) duplicate pkt

Sender sends one packet then waits for receiver response

stop and wait

rdt21 sender handles garbled ACKNAKs

rdt21 receiver handles garbled ACKNAKs

rdt21 sender handles garbled ACKNAKs

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

Wait for ACK or NAK 0 udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

Wait for call 1 from

above

Wait for ACK or NAK 1

rdt21 receiver handles garbled ACKNAKs

Wait for 0 from below

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt21 discussion

Sender seq added to pkt two seq rsquos (01)

will suffice Why must check if

received ACKNAK corrupted

twice as many states state must

ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver must check if

received packet is duplicate state indicates

whether 0 or 1 is expected pkt seq

note receiver can not know if its last ACKNAK received OK at sender

rdt22 a NAK-free protocol

same functionality as rdt21 using ACKs only instead of NAK receiver sends ACK for last pkt

received OK receiver must explicitly include seq of pkt being

ACKed

duplicate ACK at sender results in same action as NAK retransmit current pkt

rdt22 sender receiver fragments

rdt22 sender receiver fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

receiver FSMfragment

What happens if a pkt is duplicated

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs) checksum seq

ACKs retransmissions will be of help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

retransmits if no ACK received in this time

if pkt (or ACK) just delayed (not lost) retransmission will be

duplicate but use of seq rsquos already handles this

receiver must specify seq of pkt being ACKed

requires countdown timer

rdt30 sender

rdt30 sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

rdt_rcv(rcvpkt)

resend pkt1

rdt30 in action

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1

rec ack0

rec pkt1

send ack1rec ack1

send pkt1

rec pkt1

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1rec ack0

send pkt2

rec pkt2

TO

rec pkt1send ack1

rec ack1

rdt30 in action

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1rec pkt1

TO

send ack1

send pkt2rec ack1

send ack1

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1

rec pkt1

TO

send ack1

send pkt

rec ack1

send ack1

send pkt2

rec pkt2rec ack1

send no pkt (dupACK) send ack2rec ack2

send pkt2

Performance of rdt30 rdt30 works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit packet and 100bit ACK

What is the total delaybull Data transmission delay

ndash 8000109 = 810-6

bull ACK Transmission delayndash 100109 = 10-7 sec

bull Total Delay ndash 215ms + 008 + 0001=300081ms

Utilization Time transmitting total time 008 300081 = 000027

This is one pkt every 30msec or 33 kBsec over a 1 Gbps link

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Pipelining Protocols

Go-back-N big picture Sender can have up

to N unacked packets in pipeline

Rcvr only sends cumulative acks Doesnrsquot ack packet if

therersquos a gap Sender has timer for

oldest unacked packet If timer expires

retransmit all unacked packets

Selective Repeat big pic

Sender can have up to N unacked packets in pipeline

Rcvr acks individual packets

Sender maintains timer for each unacked packet When timer expires

retransmit only unack packet

Selective repeat big picture

Sender can have up to N unacked packets in pipeline

Receiver acks individual packets Sender maintains timer for each

unacked packet When timer expires retransmit only unack

packet

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

Go-Back-NPkt that could be sent unACKed pkt

Unused pktACKed pkt

send pkt

window

1 unACKed pkts

startwindowN=12

0 unACKed pkts

pkts

send pkts

windowN unACKed pkts

Next pkt to be sent

ACK arrives

windowN=12

Send pkt

N unACKed pkts

windowN-1 unACKed pkts

Sliding window

State of pkts

Go-Back-N Pkt that could be sent unACKed pkt

Unused pktACKed pkt

window

N unACKed pkts

window

N-1 unACKed pkts

ACK arrives

Send pkt

N unACKed pkts

window

N unACKed pkts

windowNo ACK arrives hellip timeout

0 unACKed pkts

window

GBN sender extended Activity Diagram

GBN Receiver Activity Diagram

GBN sender extended Activity DiagramWaiting for file

Set NSet NextPktToSend=0

Set LastACKed=-1

NextPktToSend ndash LastACKedltN

Send pkt[NextPktToSend] with SeqNum= NextPktToSend NextPktToSend++

Set Timer(NextPktToSend) = Now + TO

Wait

Clear Timers(LastACKed+1 to AN)LastACKed = AN

ACK arrived with ACKNum = AN

otherwise

Clear Timers(LastACKed+1 to NextPktToSend-1)NextPktToSend = LastACKed+1

Timer expires

GBN Receiver Activity Diagramstart

Set NextPktToRec = 0Clear ReceiverBufferClear ReceivedPktsReceiverBase = 0

ReceivedPkts[NextPktToRec] == 1

NextPktToRec++Give ReceiverBuffer[NextPktToRec] to app

Send ACK with ACKNum = NextPktToRec - 1

Place Pkt in ReceiverBuffer[SeqNum]ReceivedPkts[SeqNum]=1

wait

otherwise

GBN sender extended FSM

Wait start_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

GBN receiver extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

GBN in Action

sender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK=0Rec 1 give to app and Send ACK=1Rec 2 give to app and Send ACK=2

Rec 3 give to app and Send ACK=3

Rec 4 give to app and Send ACK=4

Rec 5 give to app and Send ACK=5

Rec 7 discard and Send ACK=5

Rec 8 discard and Send ACK=5

Rec 9 discard and Send ACK=5

Rec 10 discard and Send ACK=5

Rec 11 discard and Send ACK=5

Rec 12 discard and Send ACK=5

Rec 13 discard and Send ACK=5

Rec 6 give to app and Send ACK=6

Rec 7 give to app and Send ACK=7Rec 8 give to app and Send ACK=8

Rec 9 give to app and Send ACK=9

Optimal size of N in GBN

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order

delivery to upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart

timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed

pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]

send ACK(n) out-of-order buffer in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

Selective repeat in action

Summary of transport layer tools used so far

ACK and NACK Sequence numbers (and no NACK) Time out Sliding window

Optimal size = bandwidth delay product (if no other flows are using the network)

Cumulative ACK Buffer at the receiver is optional

Selective ACK Requires buffering at the receiver

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size

connection-oriented handshaking (exchange

of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver

reliable in-order byte steam

Pipelined and time-varying window size TCP congestion and

flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

It can be used as a pointer for placing the received data in the receiver buffer

ACKs seq of next byte

expected from other side

cumulative ACK

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

UDP checksum

Sender treat segment contents

as sequence of 16-bit integers

checksum addition (1rsquos complement sum) of segment contents

sender puts checksum value into UDP checksum field

Receiver compute checksum of

received segment check if computed checksum

equals checksum field value NO - error detected YES - no error detected

But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

Internet Checksum Example Note

When adding numbers a carryout from the most significant bit needs to be added to the result

Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Reliable data transfer

Principles of Reliable data transfer

Principles of Reliable data transfer

Reliable data transfer getting started

sendside

receiveside

rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer

udt_send() called by rdtto transfer packet over unreliable channel to

receiver

rdt_rcv() called when packet arrives on rcv-side of channel

deliver_data() called by rdt to deliver data to

upper

Reliable data transfer getting startedWersquoll incrementally develop sender receiver

sides of reliable data transfer protocol (rdt) consider only unidirectional data transfer

but control info will flow on both directions

use finite state machines (FSM) to specify sender receiver

state1

state2

event causing state transitionactions taken on state transition

state when in this ldquostaterdquo next state

uniquely determined by next event

eventactions

Rdt10 reliable transfer over a reliable channel

underlying channel perfectly reliable no bit errors no loss of packets

separate FSMs for sender receiver sender sends data into underlying channel receiver read data from underlying channel

Rdt10 reliable transfer over a reliable channel

underlying channel perfectly reliable no bit errors no loss of packets

separate FSMs for sender receiver sender sends data into underlying channel receiver read data from underlying channel

packet = make_pkt(data)udt_send(packet)

Wait for call from above

rdt_send(data)

sender

extract (packetdata)deliver_data(data)

Wait for call from

below

rdt_rcv(packet)

receiver

Rdt20 channel with bit errors

underlying channel may flip bits in packets checksum to detect bit errors

the question how to recover from errors acknowledgements (ACKs) receiver explicitly tells

sender that pkt received OK negative acknowledgements (NAKs) receiver

explicitly tells sender that pkt had errors sender retransmits pkt on receipt of NAK

new mechanisms in rdt20 (beyond rdt10) error detection receiver feedback control msgs (ACKNAK) rcvr-

gtsender

rdt20 FSM specification

rdt20 FSM specification

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

belowsender

receiverrdt_send(data)

rdt20 has a fatal flaw

What happens if ACKNAK corrupted

sender doesnrsquot know what happened at receiver

canrsquot just retransmit possible duplicate

Handling duplicates sender retransmits current

pkt if ACKNAK garbled sender adds sequence

number to each pkt receiver discards (doesnrsquot

deliver up) duplicate pkt

Sender sends one packet then waits for receiver response

stop and wait

rdt21 sender handles garbled ACKNAKs

rdt21 receiver handles garbled ACKNAKs

rdt21 sender handles garbled ACKNAKs

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

Wait for ACK or NAK 0 udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

Wait for call 1 from

above

Wait for ACK or NAK 1

rdt21 receiver handles garbled ACKNAKs

Wait for 0 from below

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt21 discussion

Sender seq added to pkt two seq rsquos (01)

will suffice Why must check if

received ACKNAK corrupted

twice as many states state must

ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver must check if

received packet is duplicate state indicates

whether 0 or 1 is expected pkt seq

note receiver can not know if its last ACKNAK received OK at sender

rdt22 a NAK-free protocol

same functionality as rdt21 using ACKs only instead of NAK receiver sends ACK for last pkt

received OK receiver must explicitly include seq of pkt being

ACKed

duplicate ACK at sender results in same action as NAK retransmit current pkt

rdt22 sender receiver fragments

rdt22 sender receiver fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

receiver FSMfragment

What happens if a pkt is duplicated

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs) checksum seq

ACKs retransmissions will be of help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

retransmits if no ACK received in this time

if pkt (or ACK) just delayed (not lost) retransmission will be

duplicate but use of seq rsquos already handles this

receiver must specify seq of pkt being ACKed

requires countdown timer

rdt30 sender

rdt30 sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

rdt_rcv(rcvpkt)

resend pkt1

rdt30 in action

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1

rec ack0

rec pkt1

send ack1rec ack1

send pkt1

rec pkt1

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1rec ack0

send pkt2

rec pkt2

TO

rec pkt1send ack1

rec ack1

rdt30 in action

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1rec pkt1

TO

send ack1

send pkt2rec ack1

send ack1

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1

rec pkt1

TO

send ack1

send pkt

rec ack1

send ack1

send pkt2

rec pkt2rec ack1

send no pkt (dupACK) send ack2rec ack2

send pkt2

Performance of rdt30 rdt30 works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit packet and 100bit ACK

What is the total delaybull Data transmission delay

ndash 8000109 = 810-6

bull ACK Transmission delayndash 100109 = 10-7 sec

bull Total Delay ndash 215ms + 008 + 0001=300081ms

Utilization Time transmitting total time 008 300081 = 000027

This is one pkt every 30msec or 33 kBsec over a 1 Gbps link

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Pipelining Protocols

Go-back-N big picture Sender can have up

to N unacked packets in pipeline

Rcvr only sends cumulative acks Doesnrsquot ack packet if

therersquos a gap Sender has timer for

oldest unacked packet If timer expires

retransmit all unacked packets

Selective Repeat big pic

Sender can have up to N unacked packets in pipeline

Rcvr acks individual packets

Sender maintains timer for each unacked packet When timer expires

retransmit only unack packet

Selective repeat big picture

Sender can have up to N unacked packets in pipeline

Receiver acks individual packets Sender maintains timer for each

unacked packet When timer expires retransmit only unack

packet

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

Go-Back-NPkt that could be sent unACKed pkt

Unused pktACKed pkt

send pkt

window

1 unACKed pkts

startwindowN=12

0 unACKed pkts

pkts

send pkts

windowN unACKed pkts

Next pkt to be sent

ACK arrives

windowN=12

Send pkt

N unACKed pkts

windowN-1 unACKed pkts

Sliding window

State of pkts

Go-Back-N Pkt that could be sent unACKed pkt

Unused pktACKed pkt

window

N unACKed pkts

window

N-1 unACKed pkts

ACK arrives

Send pkt

N unACKed pkts

window

N unACKed pkts

windowNo ACK arrives hellip timeout

0 unACKed pkts

window

GBN sender extended Activity Diagram

GBN Receiver Activity Diagram

GBN sender extended Activity DiagramWaiting for file

Set NSet NextPktToSend=0

Set LastACKed=-1

NextPktToSend ndash LastACKedltN

Send pkt[NextPktToSend] with SeqNum= NextPktToSend NextPktToSend++

Set Timer(NextPktToSend) = Now + TO

Wait

Clear Timers(LastACKed+1 to AN)LastACKed = AN

ACK arrived with ACKNum = AN

otherwise

Clear Timers(LastACKed+1 to NextPktToSend-1)NextPktToSend = LastACKed+1

Timer expires

GBN Receiver Activity Diagramstart

Set NextPktToRec = 0Clear ReceiverBufferClear ReceivedPktsReceiverBase = 0

ReceivedPkts[NextPktToRec] == 1

NextPktToRec++Give ReceiverBuffer[NextPktToRec] to app

Send ACK with ACKNum = NextPktToRec - 1

Place Pkt in ReceiverBuffer[SeqNum]ReceivedPkts[SeqNum]=1

wait

otherwise

GBN sender extended FSM

Wait start_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

GBN receiver extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

GBN in Action

sender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK=0Rec 1 give to app and Send ACK=1Rec 2 give to app and Send ACK=2

Rec 3 give to app and Send ACK=3

Rec 4 give to app and Send ACK=4

Rec 5 give to app and Send ACK=5

Rec 7 discard and Send ACK=5

Rec 8 discard and Send ACK=5

Rec 9 discard and Send ACK=5

Rec 10 discard and Send ACK=5

Rec 11 discard and Send ACK=5

Rec 12 discard and Send ACK=5

Rec 13 discard and Send ACK=5

Rec 6 give to app and Send ACK=6

Rec 7 give to app and Send ACK=7Rec 8 give to app and Send ACK=8

Rec 9 give to app and Send ACK=9

Optimal size of N in GBN

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order

delivery to upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart

timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed

pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]

send ACK(n) out-of-order buffer in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

Selective repeat in action

Summary of transport layer tools used so far

ACK and NACK Sequence numbers (and no NACK) Time out Sliding window

Optimal size = bandwidth delay product (if no other flows are using the network)

Cumulative ACK Buffer at the receiver is optional

Selective ACK Requires buffering at the receiver

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size

connection-oriented handshaking (exchange

of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver

reliable in-order byte steam

Pipelined and time-varying window size TCP congestion and

flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

It can be used as a pointer for placing the received data in the receiver buffer

ACKs seq of next byte

expected from other side

cumulative ACK

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Internet Checksum Example Note

When adding numbers a carryout from the most significant bit needs to be added to the result

Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Reliable data transfer

Principles of Reliable data transfer

Principles of Reliable data transfer

Reliable data transfer getting started

sendside

receiveside

rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer

udt_send() called by rdtto transfer packet over unreliable channel to

receiver

rdt_rcv() called when packet arrives on rcv-side of channel

deliver_data() called by rdt to deliver data to

upper

Reliable data transfer getting startedWersquoll incrementally develop sender receiver

sides of reliable data transfer protocol (rdt) consider only unidirectional data transfer

but control info will flow on both directions

use finite state machines (FSM) to specify sender receiver

state1

state2

event causing state transitionactions taken on state transition

state when in this ldquostaterdquo next state

uniquely determined by next event

eventactions

Rdt10 reliable transfer over a reliable channel

underlying channel perfectly reliable no bit errors no loss of packets

separate FSMs for sender receiver sender sends data into underlying channel receiver read data from underlying channel

Rdt10 reliable transfer over a reliable channel

underlying channel perfectly reliable no bit errors no loss of packets

separate FSMs for sender receiver sender sends data into underlying channel receiver read data from underlying channel

packet = make_pkt(data)udt_send(packet)

Wait for call from above

rdt_send(data)

sender

extract (packetdata)deliver_data(data)

Wait for call from

below

rdt_rcv(packet)

receiver

Rdt20 channel with bit errors

underlying channel may flip bits in packets checksum to detect bit errors

the question how to recover from errors acknowledgements (ACKs) receiver explicitly tells

sender that pkt received OK negative acknowledgements (NAKs) receiver

explicitly tells sender that pkt had errors sender retransmits pkt on receipt of NAK

new mechanisms in rdt20 (beyond rdt10) error detection receiver feedback control msgs (ACKNAK) rcvr-

gtsender

rdt20 FSM specification

rdt20 FSM specification

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

belowsender

receiverrdt_send(data)

rdt20 has a fatal flaw

What happens if ACKNAK corrupted

sender doesnrsquot know what happened at receiver

canrsquot just retransmit possible duplicate

Handling duplicates sender retransmits current

pkt if ACKNAK garbled sender adds sequence

number to each pkt receiver discards (doesnrsquot

deliver up) duplicate pkt

Sender sends one packet then waits for receiver response

stop and wait

rdt21 sender handles garbled ACKNAKs

rdt21 receiver handles garbled ACKNAKs

rdt21 sender handles garbled ACKNAKs

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

Wait for ACK or NAK 0 udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

Wait for call 1 from

above

Wait for ACK or NAK 1

rdt21 receiver handles garbled ACKNAKs

Wait for 0 from below

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt21 discussion

Sender seq added to pkt two seq rsquos (01)

will suffice Why must check if

received ACKNAK corrupted

twice as many states state must

ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver must check if

received packet is duplicate state indicates

whether 0 or 1 is expected pkt seq

note receiver can not know if its last ACKNAK received OK at sender

rdt22 a NAK-free protocol

same functionality as rdt21 using ACKs only instead of NAK receiver sends ACK for last pkt

received OK receiver must explicitly include seq of pkt being

ACKed

duplicate ACK at sender results in same action as NAK retransmit current pkt

rdt22 sender receiver fragments

rdt22 sender receiver fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

receiver FSMfragment

What happens if a pkt is duplicated

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs) checksum seq

ACKs retransmissions will be of help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

retransmits if no ACK received in this time

if pkt (or ACK) just delayed (not lost) retransmission will be

duplicate but use of seq rsquos already handles this

receiver must specify seq of pkt being ACKed

requires countdown timer

rdt30 sender

rdt30 sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

rdt_rcv(rcvpkt)

resend pkt1

rdt30 in action

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1

rec ack0

rec pkt1

send ack1rec ack1

send pkt1

rec pkt1

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1rec ack0

send pkt2

rec pkt2

TO

rec pkt1send ack1

rec ack1

rdt30 in action

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1rec pkt1

TO

send ack1

send pkt2rec ack1

send ack1

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1

rec pkt1

TO

send ack1

send pkt

rec ack1

send ack1

send pkt2

rec pkt2rec ack1

send no pkt (dupACK) send ack2rec ack2

send pkt2

Performance of rdt30 rdt30 works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit packet and 100bit ACK

What is the total delaybull Data transmission delay

ndash 8000109 = 810-6

bull ACK Transmission delayndash 100109 = 10-7 sec

bull Total Delay ndash 215ms + 008 + 0001=300081ms

Utilization Time transmitting total time 008 300081 = 000027

This is one pkt every 30msec or 33 kBsec over a 1 Gbps link

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Pipelining Protocols

Go-back-N big picture Sender can have up

to N unacked packets in pipeline

Rcvr only sends cumulative acks Doesnrsquot ack packet if

therersquos a gap Sender has timer for

oldest unacked packet If timer expires

retransmit all unacked packets

Selective Repeat big pic

Sender can have up to N unacked packets in pipeline

Rcvr acks individual packets

Sender maintains timer for each unacked packet When timer expires

retransmit only unack packet

Selective repeat big picture

Sender can have up to N unacked packets in pipeline

Receiver acks individual packets Sender maintains timer for each

unacked packet When timer expires retransmit only unack

packet

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

Go-Back-NPkt that could be sent unACKed pkt

Unused pktACKed pkt

send pkt

window

1 unACKed pkts

startwindowN=12

0 unACKed pkts

pkts

send pkts

windowN unACKed pkts

Next pkt to be sent

ACK arrives

windowN=12

Send pkt

N unACKed pkts

windowN-1 unACKed pkts

Sliding window

State of pkts

Go-Back-N Pkt that could be sent unACKed pkt

Unused pktACKed pkt

window

N unACKed pkts

window

N-1 unACKed pkts

ACK arrives

Send pkt

N unACKed pkts

window

N unACKed pkts

windowNo ACK arrives hellip timeout

0 unACKed pkts

window

GBN sender extended Activity Diagram

GBN Receiver Activity Diagram

GBN sender extended Activity DiagramWaiting for file

Set NSet NextPktToSend=0

Set LastACKed=-1

NextPktToSend ndash LastACKedltN

Send pkt[NextPktToSend] with SeqNum= NextPktToSend NextPktToSend++

Set Timer(NextPktToSend) = Now + TO

Wait

Clear Timers(LastACKed+1 to AN)LastACKed = AN

ACK arrived with ACKNum = AN

otherwise

Clear Timers(LastACKed+1 to NextPktToSend-1)NextPktToSend = LastACKed+1

Timer expires

GBN Receiver Activity Diagramstart

Set NextPktToRec = 0Clear ReceiverBufferClear ReceivedPktsReceiverBase = 0

ReceivedPkts[NextPktToRec] == 1

NextPktToRec++Give ReceiverBuffer[NextPktToRec] to app

Send ACK with ACKNum = NextPktToRec - 1

Place Pkt in ReceiverBuffer[SeqNum]ReceivedPkts[SeqNum]=1

wait

otherwise

GBN sender extended FSM

Wait start_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

GBN receiver extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

GBN in Action

sender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK=0Rec 1 give to app and Send ACK=1Rec 2 give to app and Send ACK=2

Rec 3 give to app and Send ACK=3

Rec 4 give to app and Send ACK=4

Rec 5 give to app and Send ACK=5

Rec 7 discard and Send ACK=5

Rec 8 discard and Send ACK=5

Rec 9 discard and Send ACK=5

Rec 10 discard and Send ACK=5

Rec 11 discard and Send ACK=5

Rec 12 discard and Send ACK=5

Rec 13 discard and Send ACK=5

Rec 6 give to app and Send ACK=6

Rec 7 give to app and Send ACK=7Rec 8 give to app and Send ACK=8

Rec 9 give to app and Send ACK=9

Optimal size of N in GBN

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order

delivery to upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart

timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed

pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]

send ACK(n) out-of-order buffer in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

Selective repeat in action

Summary of transport layer tools used so far

ACK and NACK Sequence numbers (and no NACK) Time out Sliding window

Optimal size = bandwidth delay product (if no other flows are using the network)

Cumulative ACK Buffer at the receiver is optional

Selective ACK Requires buffering at the receiver

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size

connection-oriented handshaking (exchange

of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver

reliable in-order byte steam

Pipelined and time-varying window size TCP congestion and

flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

It can be used as a pointer for placing the received data in the receiver buffer

ACKs seq of next byte

expected from other side

cumulative ACK

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Reliable data transfer

Principles of Reliable data transfer

Principles of Reliable data transfer

Reliable data transfer getting started

sendside

receiveside

rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer

udt_send() called by rdtto transfer packet over unreliable channel to

receiver

rdt_rcv() called when packet arrives on rcv-side of channel

deliver_data() called by rdt to deliver data to

upper

Reliable data transfer getting startedWersquoll incrementally develop sender receiver

sides of reliable data transfer protocol (rdt) consider only unidirectional data transfer

but control info will flow on both directions

use finite state machines (FSM) to specify sender receiver

state1

state2

event causing state transitionactions taken on state transition

state when in this ldquostaterdquo next state

uniquely determined by next event

eventactions

Rdt10 reliable transfer over a reliable channel

underlying channel perfectly reliable no bit errors no loss of packets

separate FSMs for sender receiver sender sends data into underlying channel receiver read data from underlying channel

Rdt10 reliable transfer over a reliable channel

underlying channel perfectly reliable no bit errors no loss of packets

separate FSMs for sender receiver sender sends data into underlying channel receiver read data from underlying channel

packet = make_pkt(data)udt_send(packet)

Wait for call from above

rdt_send(data)

sender

extract (packetdata)deliver_data(data)

Wait for call from

below

rdt_rcv(packet)

receiver

Rdt20 channel with bit errors

underlying channel may flip bits in packets checksum to detect bit errors

the question how to recover from errors acknowledgements (ACKs) receiver explicitly tells

sender that pkt received OK negative acknowledgements (NAKs) receiver

explicitly tells sender that pkt had errors sender retransmits pkt on receipt of NAK

new mechanisms in rdt20 (beyond rdt10) error detection receiver feedback control msgs (ACKNAK) rcvr-

gtsender

rdt20 FSM specification

rdt20 FSM specification

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

belowsender

receiverrdt_send(data)

rdt20 has a fatal flaw

What happens if ACKNAK corrupted

sender doesnrsquot know what happened at receiver

canrsquot just retransmit possible duplicate

Handling duplicates sender retransmits current

pkt if ACKNAK garbled sender adds sequence

number to each pkt receiver discards (doesnrsquot

deliver up) duplicate pkt

Sender sends one packet then waits for receiver response

stop and wait

rdt21 sender handles garbled ACKNAKs

rdt21 receiver handles garbled ACKNAKs

rdt21 sender handles garbled ACKNAKs

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

Wait for ACK or NAK 0 udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

Wait for call 1 from

above

Wait for ACK or NAK 1

rdt21 receiver handles garbled ACKNAKs

Wait for 0 from below

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt21 discussion

Sender seq added to pkt two seq rsquos (01)

will suffice Why must check if

received ACKNAK corrupted

twice as many states state must

ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver must check if

received packet is duplicate state indicates

whether 0 or 1 is expected pkt seq

note receiver can not know if its last ACKNAK received OK at sender

rdt22 a NAK-free protocol

same functionality as rdt21 using ACKs only instead of NAK receiver sends ACK for last pkt

received OK receiver must explicitly include seq of pkt being

ACKed

duplicate ACK at sender results in same action as NAK retransmit current pkt

rdt22 sender receiver fragments

rdt22 sender receiver fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

receiver FSMfragment

What happens if a pkt is duplicated

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs) checksum seq

ACKs retransmissions will be of help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

retransmits if no ACK received in this time

if pkt (or ACK) just delayed (not lost) retransmission will be

duplicate but use of seq rsquos already handles this

receiver must specify seq of pkt being ACKed

requires countdown timer

rdt30 sender

rdt30 sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

rdt_rcv(rcvpkt)

resend pkt1

rdt30 in action

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1

rec ack0

rec pkt1

send ack1rec ack1

send pkt1

rec pkt1

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1rec ack0

send pkt2

rec pkt2

TO

rec pkt1send ack1

rec ack1

rdt30 in action

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1rec pkt1

TO

send ack1

send pkt2rec ack1

send ack1

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1

rec pkt1

TO

send ack1

send pkt

rec ack1

send ack1

send pkt2

rec pkt2rec ack1

send no pkt (dupACK) send ack2rec ack2

send pkt2

Performance of rdt30 rdt30 works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit packet and 100bit ACK

What is the total delaybull Data transmission delay

ndash 8000109 = 810-6

bull ACK Transmission delayndash 100109 = 10-7 sec

bull Total Delay ndash 215ms + 008 + 0001=300081ms

Utilization Time transmitting total time 008 300081 = 000027

This is one pkt every 30msec or 33 kBsec over a 1 Gbps link

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Pipelining Protocols

Go-back-N big picture Sender can have up

to N unacked packets in pipeline

Rcvr only sends cumulative acks Doesnrsquot ack packet if

therersquos a gap Sender has timer for

oldest unacked packet If timer expires

retransmit all unacked packets

Selective Repeat big pic

Sender can have up to N unacked packets in pipeline

Rcvr acks individual packets

Sender maintains timer for each unacked packet When timer expires

retransmit only unack packet

Selective repeat big picture

Sender can have up to N unacked packets in pipeline

Receiver acks individual packets Sender maintains timer for each

unacked packet When timer expires retransmit only unack

packet

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

Go-Back-NPkt that could be sent unACKed pkt

Unused pktACKed pkt

send pkt

window

1 unACKed pkts

startwindowN=12

0 unACKed pkts

pkts

send pkts

windowN unACKed pkts

Next pkt to be sent

ACK arrives

windowN=12

Send pkt

N unACKed pkts

windowN-1 unACKed pkts

Sliding window

State of pkts

Go-Back-N Pkt that could be sent unACKed pkt

Unused pktACKed pkt

window

N unACKed pkts

window

N-1 unACKed pkts

ACK arrives

Send pkt

N unACKed pkts

window

N unACKed pkts

windowNo ACK arrives hellip timeout

0 unACKed pkts

window

GBN sender extended Activity Diagram

GBN Receiver Activity Diagram

GBN sender extended Activity DiagramWaiting for file

Set NSet NextPktToSend=0

Set LastACKed=-1

NextPktToSend ndash LastACKedltN

Send pkt[NextPktToSend] with SeqNum= NextPktToSend NextPktToSend++

Set Timer(NextPktToSend) = Now + TO

Wait

Clear Timers(LastACKed+1 to AN)LastACKed = AN

ACK arrived with ACKNum = AN

otherwise

Clear Timers(LastACKed+1 to NextPktToSend-1)NextPktToSend = LastACKed+1

Timer expires

GBN Receiver Activity Diagramstart

Set NextPktToRec = 0Clear ReceiverBufferClear ReceivedPktsReceiverBase = 0

ReceivedPkts[NextPktToRec] == 1

NextPktToRec++Give ReceiverBuffer[NextPktToRec] to app

Send ACK with ACKNum = NextPktToRec - 1

Place Pkt in ReceiverBuffer[SeqNum]ReceivedPkts[SeqNum]=1

wait

otherwise

GBN sender extended FSM

Wait start_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

GBN receiver extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

GBN in Action

sender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK=0Rec 1 give to app and Send ACK=1Rec 2 give to app and Send ACK=2

Rec 3 give to app and Send ACK=3

Rec 4 give to app and Send ACK=4

Rec 5 give to app and Send ACK=5

Rec 7 discard and Send ACK=5

Rec 8 discard and Send ACK=5

Rec 9 discard and Send ACK=5

Rec 10 discard and Send ACK=5

Rec 11 discard and Send ACK=5

Rec 12 discard and Send ACK=5

Rec 13 discard and Send ACK=5

Rec 6 give to app and Send ACK=6

Rec 7 give to app and Send ACK=7Rec 8 give to app and Send ACK=8

Rec 9 give to app and Send ACK=9

Optimal size of N in GBN

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order

delivery to upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart

timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed

pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]

send ACK(n) out-of-order buffer in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

Selective repeat in action

Summary of transport layer tools used so far

ACK and NACK Sequence numbers (and no NACK) Time out Sliding window

Optimal size = bandwidth delay product (if no other flows are using the network)

Cumulative ACK Buffer at the receiver is optional

Selective ACK Requires buffering at the receiver

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size

connection-oriented handshaking (exchange

of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver

reliable in-order byte steam

Pipelined and time-varying window size TCP congestion and

flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

It can be used as a pointer for placing the received data in the receiver buffer

ACKs seq of next byte

expected from other side

cumulative ACK

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Principles of Reliable data transfer

Principles of Reliable data transfer

Principles of Reliable data transfer

Reliable data transfer getting started

sendside

receiveside

rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer

udt_send() called by rdtto transfer packet over unreliable channel to

receiver

rdt_rcv() called when packet arrives on rcv-side of channel

deliver_data() called by rdt to deliver data to

upper

Reliable data transfer getting startedWersquoll incrementally develop sender receiver

sides of reliable data transfer protocol (rdt) consider only unidirectional data transfer

but control info will flow on both directions

use finite state machines (FSM) to specify sender receiver

state1

state2

event causing state transitionactions taken on state transition

state when in this ldquostaterdquo next state

uniquely determined by next event

eventactions

Rdt10 reliable transfer over a reliable channel

underlying channel perfectly reliable no bit errors no loss of packets

separate FSMs for sender receiver sender sends data into underlying channel receiver read data from underlying channel

Rdt10 reliable transfer over a reliable channel

underlying channel perfectly reliable no bit errors no loss of packets

separate FSMs for sender receiver sender sends data into underlying channel receiver read data from underlying channel

packet = make_pkt(data)udt_send(packet)

Wait for call from above

rdt_send(data)

sender

extract (packetdata)deliver_data(data)

Wait for call from

below

rdt_rcv(packet)

receiver

Rdt20 channel with bit errors

underlying channel may flip bits in packets checksum to detect bit errors

the question how to recover from errors acknowledgements (ACKs) receiver explicitly tells

sender that pkt received OK negative acknowledgements (NAKs) receiver

explicitly tells sender that pkt had errors sender retransmits pkt on receipt of NAK

new mechanisms in rdt20 (beyond rdt10) error detection receiver feedback control msgs (ACKNAK) rcvr-

gtsender

rdt20 FSM specification

rdt20 FSM specification

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

belowsender

receiverrdt_send(data)

rdt20 has a fatal flaw

What happens if ACKNAK corrupted

sender doesnrsquot know what happened at receiver

canrsquot just retransmit possible duplicate

Handling duplicates sender retransmits current

pkt if ACKNAK garbled sender adds sequence

number to each pkt receiver discards (doesnrsquot

deliver up) duplicate pkt

Sender sends one packet then waits for receiver response

stop and wait

rdt21 sender handles garbled ACKNAKs

rdt21 receiver handles garbled ACKNAKs

rdt21 sender handles garbled ACKNAKs

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

Wait for ACK or NAK 0 udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

Wait for call 1 from

above

Wait for ACK or NAK 1

rdt21 receiver handles garbled ACKNAKs

Wait for 0 from below

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt21 discussion

Sender seq added to pkt two seq rsquos (01)

will suffice Why must check if

received ACKNAK corrupted

twice as many states state must

ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver must check if

received packet is duplicate state indicates

whether 0 or 1 is expected pkt seq

note receiver can not know if its last ACKNAK received OK at sender

rdt22 a NAK-free protocol

same functionality as rdt21 using ACKs only instead of NAK receiver sends ACK for last pkt

received OK receiver must explicitly include seq of pkt being

ACKed

duplicate ACK at sender results in same action as NAK retransmit current pkt

rdt22 sender receiver fragments

rdt22 sender receiver fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

receiver FSMfragment

What happens if a pkt is duplicated

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs) checksum seq

ACKs retransmissions will be of help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

retransmits if no ACK received in this time

if pkt (or ACK) just delayed (not lost) retransmission will be

duplicate but use of seq rsquos already handles this

receiver must specify seq of pkt being ACKed

requires countdown timer

rdt30 sender

rdt30 sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

rdt_rcv(rcvpkt)

resend pkt1

rdt30 in action

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1

rec ack0

rec pkt1

send ack1rec ack1

send pkt1

rec pkt1

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1rec ack0

send pkt2

rec pkt2

TO

rec pkt1send ack1

rec ack1

rdt30 in action

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1rec pkt1

TO

send ack1

send pkt2rec ack1

send ack1

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1

rec pkt1

TO

send ack1

send pkt

rec ack1

send ack1

send pkt2

rec pkt2rec ack1

send no pkt (dupACK) send ack2rec ack2

send pkt2

Performance of rdt30 rdt30 works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit packet and 100bit ACK

What is the total delaybull Data transmission delay

ndash 8000109 = 810-6

bull ACK Transmission delayndash 100109 = 10-7 sec

bull Total Delay ndash 215ms + 008 + 0001=300081ms

Utilization Time transmitting total time 008 300081 = 000027

This is one pkt every 30msec or 33 kBsec over a 1 Gbps link

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Pipelining Protocols

Go-back-N big picture Sender can have up

to N unacked packets in pipeline

Rcvr only sends cumulative acks Doesnrsquot ack packet if

therersquos a gap Sender has timer for

oldest unacked packet If timer expires

retransmit all unacked packets

Selective Repeat big pic

Sender can have up to N unacked packets in pipeline

Rcvr acks individual packets

Sender maintains timer for each unacked packet When timer expires

retransmit only unack packet

Selective repeat big picture

Sender can have up to N unacked packets in pipeline

Receiver acks individual packets Sender maintains timer for each

unacked packet When timer expires retransmit only unack

packet

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

Go-Back-NPkt that could be sent unACKed pkt

Unused pktACKed pkt

send pkt

window

1 unACKed pkts

startwindowN=12

0 unACKed pkts

pkts

send pkts

windowN unACKed pkts

Next pkt to be sent

ACK arrives

windowN=12

Send pkt

N unACKed pkts

windowN-1 unACKed pkts

Sliding window

State of pkts

Go-Back-N Pkt that could be sent unACKed pkt

Unused pktACKed pkt

window

N unACKed pkts

window

N-1 unACKed pkts

ACK arrives

Send pkt

N unACKed pkts

window

N unACKed pkts

windowNo ACK arrives hellip timeout

0 unACKed pkts

window

GBN sender extended Activity Diagram

GBN Receiver Activity Diagram

GBN sender extended Activity DiagramWaiting for file

Set NSet NextPktToSend=0

Set LastACKed=-1

NextPktToSend ndash LastACKedltN

Send pkt[NextPktToSend] with SeqNum= NextPktToSend NextPktToSend++

Set Timer(NextPktToSend) = Now + TO

Wait

Clear Timers(LastACKed+1 to AN)LastACKed = AN

ACK arrived with ACKNum = AN

otherwise

Clear Timers(LastACKed+1 to NextPktToSend-1)NextPktToSend = LastACKed+1

Timer expires

GBN Receiver Activity Diagramstart

Set NextPktToRec = 0Clear ReceiverBufferClear ReceivedPktsReceiverBase = 0

ReceivedPkts[NextPktToRec] == 1

NextPktToRec++Give ReceiverBuffer[NextPktToRec] to app

Send ACK with ACKNum = NextPktToRec - 1

Place Pkt in ReceiverBuffer[SeqNum]ReceivedPkts[SeqNum]=1

wait

otherwise

GBN sender extended FSM

Wait start_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

GBN receiver extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

GBN in Action

sender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK=0Rec 1 give to app and Send ACK=1Rec 2 give to app and Send ACK=2

Rec 3 give to app and Send ACK=3

Rec 4 give to app and Send ACK=4

Rec 5 give to app and Send ACK=5

Rec 7 discard and Send ACK=5

Rec 8 discard and Send ACK=5

Rec 9 discard and Send ACK=5

Rec 10 discard and Send ACK=5

Rec 11 discard and Send ACK=5

Rec 12 discard and Send ACK=5

Rec 13 discard and Send ACK=5

Rec 6 give to app and Send ACK=6

Rec 7 give to app and Send ACK=7Rec 8 give to app and Send ACK=8

Rec 9 give to app and Send ACK=9

Optimal size of N in GBN

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order

delivery to upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart

timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed

pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]

send ACK(n) out-of-order buffer in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

Selective repeat in action

Summary of transport layer tools used so far

ACK and NACK Sequence numbers (and no NACK) Time out Sliding window

Optimal size = bandwidth delay product (if no other flows are using the network)

Cumulative ACK Buffer at the receiver is optional

Selective ACK Requires buffering at the receiver

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size

connection-oriented handshaking (exchange

of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver

reliable in-order byte steam

Pipelined and time-varying window size TCP congestion and

flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

It can be used as a pointer for placing the received data in the receiver buffer

ACKs seq of next byte

expected from other side

cumulative ACK

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Principles of Reliable data transfer

Principles of Reliable data transfer

Reliable data transfer getting started

sendside

receiveside

rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer

udt_send() called by rdtto transfer packet over unreliable channel to

receiver

rdt_rcv() called when packet arrives on rcv-side of channel

deliver_data() called by rdt to deliver data to

upper

Reliable data transfer getting startedWersquoll incrementally develop sender receiver

sides of reliable data transfer protocol (rdt) consider only unidirectional data transfer

but control info will flow on both directions

use finite state machines (FSM) to specify sender receiver

state1

state2

event causing state transitionactions taken on state transition

state when in this ldquostaterdquo next state

uniquely determined by next event

eventactions

Rdt10 reliable transfer over a reliable channel

underlying channel perfectly reliable no bit errors no loss of packets

separate FSMs for sender receiver sender sends data into underlying channel receiver read data from underlying channel

Rdt10 reliable transfer over a reliable channel

underlying channel perfectly reliable no bit errors no loss of packets

separate FSMs for sender receiver sender sends data into underlying channel receiver read data from underlying channel

packet = make_pkt(data)udt_send(packet)

Wait for call from above

rdt_send(data)

sender

extract (packetdata)deliver_data(data)

Wait for call from

below

rdt_rcv(packet)

receiver

Rdt20 channel with bit errors

underlying channel may flip bits in packets checksum to detect bit errors

the question how to recover from errors acknowledgements (ACKs) receiver explicitly tells

sender that pkt received OK negative acknowledgements (NAKs) receiver

explicitly tells sender that pkt had errors sender retransmits pkt on receipt of NAK

new mechanisms in rdt20 (beyond rdt10) error detection receiver feedback control msgs (ACKNAK) rcvr-

gtsender

rdt20 FSM specification

rdt20 FSM specification

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

belowsender

receiverrdt_send(data)

rdt20 has a fatal flaw

What happens if ACKNAK corrupted

sender doesnrsquot know what happened at receiver

canrsquot just retransmit possible duplicate

Handling duplicates sender retransmits current

pkt if ACKNAK garbled sender adds sequence

number to each pkt receiver discards (doesnrsquot

deliver up) duplicate pkt

Sender sends one packet then waits for receiver response

stop and wait

rdt21 sender handles garbled ACKNAKs

rdt21 receiver handles garbled ACKNAKs

rdt21 sender handles garbled ACKNAKs

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

Wait for ACK or NAK 0 udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

Wait for call 1 from

above

Wait for ACK or NAK 1

rdt21 receiver handles garbled ACKNAKs

Wait for 0 from below

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt21 discussion

Sender seq added to pkt two seq rsquos (01)

will suffice Why must check if

received ACKNAK corrupted

twice as many states state must

ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver must check if

received packet is duplicate state indicates

whether 0 or 1 is expected pkt seq

note receiver can not know if its last ACKNAK received OK at sender

rdt22 a NAK-free protocol

same functionality as rdt21 using ACKs only instead of NAK receiver sends ACK for last pkt

received OK receiver must explicitly include seq of pkt being

ACKed

duplicate ACK at sender results in same action as NAK retransmit current pkt

rdt22 sender receiver fragments

rdt22 sender receiver fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

receiver FSMfragment

What happens if a pkt is duplicated

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs) checksum seq

ACKs retransmissions will be of help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

retransmits if no ACK received in this time

if pkt (or ACK) just delayed (not lost) retransmission will be

duplicate but use of seq rsquos already handles this

receiver must specify seq of pkt being ACKed

requires countdown timer

rdt30 sender

rdt30 sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

rdt_rcv(rcvpkt)

resend pkt1

rdt30 in action

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1

rec ack0

rec pkt1

send ack1rec ack1

send pkt1

rec pkt1

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1rec ack0

send pkt2

rec pkt2

TO

rec pkt1send ack1

rec ack1

rdt30 in action

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1rec pkt1

TO

send ack1

send pkt2rec ack1

send ack1

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1

rec pkt1

TO

send ack1

send pkt

rec ack1

send ack1

send pkt2

rec pkt2rec ack1

send no pkt (dupACK) send ack2rec ack2

send pkt2

Performance of rdt30 rdt30 works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit packet and 100bit ACK

What is the total delaybull Data transmission delay

ndash 8000109 = 810-6

bull ACK Transmission delayndash 100109 = 10-7 sec

bull Total Delay ndash 215ms + 008 + 0001=300081ms

Utilization Time transmitting total time 008 300081 = 000027

This is one pkt every 30msec or 33 kBsec over a 1 Gbps link

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Pipelining Protocols

Go-back-N big picture Sender can have up

to N unacked packets in pipeline

Rcvr only sends cumulative acks Doesnrsquot ack packet if

therersquos a gap Sender has timer for

oldest unacked packet If timer expires

retransmit all unacked packets

Selective Repeat big pic

Sender can have up to N unacked packets in pipeline

Rcvr acks individual packets

Sender maintains timer for each unacked packet When timer expires

retransmit only unack packet

Selective repeat big picture

Sender can have up to N unacked packets in pipeline

Receiver acks individual packets Sender maintains timer for each

unacked packet When timer expires retransmit only unack

packet

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

Go-Back-NPkt that could be sent unACKed pkt

Unused pktACKed pkt

send pkt

window

1 unACKed pkts

startwindowN=12

0 unACKed pkts

pkts

send pkts

windowN unACKed pkts

Next pkt to be sent

ACK arrives

windowN=12

Send pkt

N unACKed pkts

windowN-1 unACKed pkts

Sliding window

State of pkts

Go-Back-N Pkt that could be sent unACKed pkt

Unused pktACKed pkt

window

N unACKed pkts

window

N-1 unACKed pkts

ACK arrives

Send pkt

N unACKed pkts

window

N unACKed pkts

windowNo ACK arrives hellip timeout

0 unACKed pkts

window

GBN sender extended Activity Diagram

GBN Receiver Activity Diagram

GBN sender extended Activity DiagramWaiting for file

Set NSet NextPktToSend=0

Set LastACKed=-1

NextPktToSend ndash LastACKedltN

Send pkt[NextPktToSend] with SeqNum= NextPktToSend NextPktToSend++

Set Timer(NextPktToSend) = Now + TO

Wait

Clear Timers(LastACKed+1 to AN)LastACKed = AN

ACK arrived with ACKNum = AN

otherwise

Clear Timers(LastACKed+1 to NextPktToSend-1)NextPktToSend = LastACKed+1

Timer expires

GBN Receiver Activity Diagramstart

Set NextPktToRec = 0Clear ReceiverBufferClear ReceivedPktsReceiverBase = 0

ReceivedPkts[NextPktToRec] == 1

NextPktToRec++Give ReceiverBuffer[NextPktToRec] to app

Send ACK with ACKNum = NextPktToRec - 1

Place Pkt in ReceiverBuffer[SeqNum]ReceivedPkts[SeqNum]=1

wait

otherwise

GBN sender extended FSM

Wait start_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

GBN receiver extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

GBN in Action

sender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK=0Rec 1 give to app and Send ACK=1Rec 2 give to app and Send ACK=2

Rec 3 give to app and Send ACK=3

Rec 4 give to app and Send ACK=4

Rec 5 give to app and Send ACK=5

Rec 7 discard and Send ACK=5

Rec 8 discard and Send ACK=5

Rec 9 discard and Send ACK=5

Rec 10 discard and Send ACK=5

Rec 11 discard and Send ACK=5

Rec 12 discard and Send ACK=5

Rec 13 discard and Send ACK=5

Rec 6 give to app and Send ACK=6

Rec 7 give to app and Send ACK=7Rec 8 give to app and Send ACK=8

Rec 9 give to app and Send ACK=9

Optimal size of N in GBN

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order

delivery to upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart

timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed

pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]

send ACK(n) out-of-order buffer in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

Selective repeat in action

Summary of transport layer tools used so far

ACK and NACK Sequence numbers (and no NACK) Time out Sliding window

Optimal size = bandwidth delay product (if no other flows are using the network)

Cumulative ACK Buffer at the receiver is optional

Selective ACK Requires buffering at the receiver

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size

connection-oriented handshaking (exchange

of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver

reliable in-order byte steam

Pipelined and time-varying window size TCP congestion and

flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

It can be used as a pointer for placing the received data in the receiver buffer

ACKs seq of next byte

expected from other side

cumulative ACK

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Principles of Reliable data transfer

Reliable data transfer getting started

sendside

receiveside

rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer

udt_send() called by rdtto transfer packet over unreliable channel to

receiver

rdt_rcv() called when packet arrives on rcv-side of channel

deliver_data() called by rdt to deliver data to

upper

Reliable data transfer getting startedWersquoll incrementally develop sender receiver

sides of reliable data transfer protocol (rdt) consider only unidirectional data transfer

but control info will flow on both directions

use finite state machines (FSM) to specify sender receiver

state1

state2

event causing state transitionactions taken on state transition

state when in this ldquostaterdquo next state

uniquely determined by next event

eventactions

Rdt10 reliable transfer over a reliable channel

underlying channel perfectly reliable no bit errors no loss of packets

separate FSMs for sender receiver sender sends data into underlying channel receiver read data from underlying channel

Rdt10 reliable transfer over a reliable channel

underlying channel perfectly reliable no bit errors no loss of packets

separate FSMs for sender receiver sender sends data into underlying channel receiver read data from underlying channel

packet = make_pkt(data)udt_send(packet)

Wait for call from above

rdt_send(data)

sender

extract (packetdata)deliver_data(data)

Wait for call from

below

rdt_rcv(packet)

receiver

Rdt20 channel with bit errors

underlying channel may flip bits in packets checksum to detect bit errors

the question how to recover from errors acknowledgements (ACKs) receiver explicitly tells

sender that pkt received OK negative acknowledgements (NAKs) receiver

explicitly tells sender that pkt had errors sender retransmits pkt on receipt of NAK

new mechanisms in rdt20 (beyond rdt10) error detection receiver feedback control msgs (ACKNAK) rcvr-

gtsender

rdt20 FSM specification

rdt20 FSM specification

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

belowsender

receiverrdt_send(data)

rdt20 has a fatal flaw

What happens if ACKNAK corrupted

sender doesnrsquot know what happened at receiver

canrsquot just retransmit possible duplicate

Handling duplicates sender retransmits current

pkt if ACKNAK garbled sender adds sequence

number to each pkt receiver discards (doesnrsquot

deliver up) duplicate pkt

Sender sends one packet then waits for receiver response

stop and wait

rdt21 sender handles garbled ACKNAKs

rdt21 receiver handles garbled ACKNAKs

rdt21 sender handles garbled ACKNAKs

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

Wait for ACK or NAK 0 udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

Wait for call 1 from

above

Wait for ACK or NAK 1

rdt21 receiver handles garbled ACKNAKs

Wait for 0 from below

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt21 discussion

Sender seq added to pkt two seq rsquos (01)

will suffice Why must check if

received ACKNAK corrupted

twice as many states state must

ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver must check if

received packet is duplicate state indicates

whether 0 or 1 is expected pkt seq

note receiver can not know if its last ACKNAK received OK at sender

rdt22 a NAK-free protocol

same functionality as rdt21 using ACKs only instead of NAK receiver sends ACK for last pkt

received OK receiver must explicitly include seq of pkt being

ACKed

duplicate ACK at sender results in same action as NAK retransmit current pkt

rdt22 sender receiver fragments

rdt22 sender receiver fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

receiver FSMfragment

What happens if a pkt is duplicated

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs) checksum seq

ACKs retransmissions will be of help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

retransmits if no ACK received in this time

if pkt (or ACK) just delayed (not lost) retransmission will be

duplicate but use of seq rsquos already handles this

receiver must specify seq of pkt being ACKed

requires countdown timer

rdt30 sender

rdt30 sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

rdt_rcv(rcvpkt)

resend pkt1

rdt30 in action

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1

rec ack0

rec pkt1

send ack1rec ack1

send pkt1

rec pkt1

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1rec ack0

send pkt2

rec pkt2

TO

rec pkt1send ack1

rec ack1

rdt30 in action

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1rec pkt1

TO

send ack1

send pkt2rec ack1

send ack1

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1

rec pkt1

TO

send ack1

send pkt

rec ack1

send ack1

send pkt2

rec pkt2rec ack1

send no pkt (dupACK) send ack2rec ack2

send pkt2

Performance of rdt30 rdt30 works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit packet and 100bit ACK

What is the total delaybull Data transmission delay

ndash 8000109 = 810-6

bull ACK Transmission delayndash 100109 = 10-7 sec

bull Total Delay ndash 215ms + 008 + 0001=300081ms

Utilization Time transmitting total time 008 300081 = 000027

This is one pkt every 30msec or 33 kBsec over a 1 Gbps link

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Pipelining Protocols

Go-back-N big picture Sender can have up

to N unacked packets in pipeline

Rcvr only sends cumulative acks Doesnrsquot ack packet if

therersquos a gap Sender has timer for

oldest unacked packet If timer expires

retransmit all unacked packets

Selective Repeat big pic

Sender can have up to N unacked packets in pipeline

Rcvr acks individual packets

Sender maintains timer for each unacked packet When timer expires

retransmit only unack packet

Selective repeat big picture

Sender can have up to N unacked packets in pipeline

Receiver acks individual packets Sender maintains timer for each

unacked packet When timer expires retransmit only unack

packet

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

Go-Back-NPkt that could be sent unACKed pkt

Unused pktACKed pkt

send pkt

window

1 unACKed pkts

startwindowN=12

0 unACKed pkts

pkts

send pkts

windowN unACKed pkts

Next pkt to be sent

ACK arrives

windowN=12

Send pkt

N unACKed pkts

windowN-1 unACKed pkts

Sliding window

State of pkts

Go-Back-N Pkt that could be sent unACKed pkt

Unused pktACKed pkt

window

N unACKed pkts

window

N-1 unACKed pkts

ACK arrives

Send pkt

N unACKed pkts

window

N unACKed pkts

windowNo ACK arrives hellip timeout

0 unACKed pkts

window

GBN sender extended Activity Diagram

GBN Receiver Activity Diagram

GBN sender extended Activity DiagramWaiting for file

Set NSet NextPktToSend=0

Set LastACKed=-1

NextPktToSend ndash LastACKedltN

Send pkt[NextPktToSend] with SeqNum= NextPktToSend NextPktToSend++

Set Timer(NextPktToSend) = Now + TO

Wait

Clear Timers(LastACKed+1 to AN)LastACKed = AN

ACK arrived with ACKNum = AN

otherwise

Clear Timers(LastACKed+1 to NextPktToSend-1)NextPktToSend = LastACKed+1

Timer expires

GBN Receiver Activity Diagramstart

Set NextPktToRec = 0Clear ReceiverBufferClear ReceivedPktsReceiverBase = 0

ReceivedPkts[NextPktToRec] == 1

NextPktToRec++Give ReceiverBuffer[NextPktToRec] to app

Send ACK with ACKNum = NextPktToRec - 1

Place Pkt in ReceiverBuffer[SeqNum]ReceivedPkts[SeqNum]=1

wait

otherwise

GBN sender extended FSM

Wait start_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

GBN receiver extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

GBN in Action

sender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK=0Rec 1 give to app and Send ACK=1Rec 2 give to app and Send ACK=2

Rec 3 give to app and Send ACK=3

Rec 4 give to app and Send ACK=4

Rec 5 give to app and Send ACK=5

Rec 7 discard and Send ACK=5

Rec 8 discard and Send ACK=5

Rec 9 discard and Send ACK=5

Rec 10 discard and Send ACK=5

Rec 11 discard and Send ACK=5

Rec 12 discard and Send ACK=5

Rec 13 discard and Send ACK=5

Rec 6 give to app and Send ACK=6

Rec 7 give to app and Send ACK=7Rec 8 give to app and Send ACK=8

Rec 9 give to app and Send ACK=9

Optimal size of N in GBN

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order

delivery to upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart

timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed

pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]

send ACK(n) out-of-order buffer in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

Selective repeat in action

Summary of transport layer tools used so far

ACK and NACK Sequence numbers (and no NACK) Time out Sliding window

Optimal size = bandwidth delay product (if no other flows are using the network)

Cumulative ACK Buffer at the receiver is optional

Selective ACK Requires buffering at the receiver

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size

connection-oriented handshaking (exchange

of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver

reliable in-order byte steam

Pipelined and time-varying window size TCP congestion and

flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

It can be used as a pointer for placing the received data in the receiver buffer

ACKs seq of next byte

expected from other side

cumulative ACK

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Reliable data transfer getting started

sendside

receiveside

rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer

udt_send() called by rdtto transfer packet over unreliable channel to

receiver

rdt_rcv() called when packet arrives on rcv-side of channel

deliver_data() called by rdt to deliver data to

upper

Reliable data transfer getting startedWersquoll incrementally develop sender receiver

sides of reliable data transfer protocol (rdt) consider only unidirectional data transfer

but control info will flow on both directions

use finite state machines (FSM) to specify sender receiver

state1

state2

event causing state transitionactions taken on state transition

state when in this ldquostaterdquo next state

uniquely determined by next event

eventactions

Rdt10 reliable transfer over a reliable channel

underlying channel perfectly reliable no bit errors no loss of packets

separate FSMs for sender receiver sender sends data into underlying channel receiver read data from underlying channel

Rdt10 reliable transfer over a reliable channel

underlying channel perfectly reliable no bit errors no loss of packets

separate FSMs for sender receiver sender sends data into underlying channel receiver read data from underlying channel

packet = make_pkt(data)udt_send(packet)

Wait for call from above

rdt_send(data)

sender

extract (packetdata)deliver_data(data)

Wait for call from

below

rdt_rcv(packet)

receiver

Rdt20 channel with bit errors

underlying channel may flip bits in packets checksum to detect bit errors

the question how to recover from errors acknowledgements (ACKs) receiver explicitly tells

sender that pkt received OK negative acknowledgements (NAKs) receiver

explicitly tells sender that pkt had errors sender retransmits pkt on receipt of NAK

new mechanisms in rdt20 (beyond rdt10) error detection receiver feedback control msgs (ACKNAK) rcvr-

gtsender

rdt20 FSM specification

rdt20 FSM specification

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

belowsender

receiverrdt_send(data)

rdt20 has a fatal flaw

What happens if ACKNAK corrupted

sender doesnrsquot know what happened at receiver

canrsquot just retransmit possible duplicate

Handling duplicates sender retransmits current

pkt if ACKNAK garbled sender adds sequence

number to each pkt receiver discards (doesnrsquot

deliver up) duplicate pkt

Sender sends one packet then waits for receiver response

stop and wait

rdt21 sender handles garbled ACKNAKs

rdt21 receiver handles garbled ACKNAKs

rdt21 sender handles garbled ACKNAKs

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

Wait for ACK or NAK 0 udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

Wait for call 1 from

above

Wait for ACK or NAK 1

rdt21 receiver handles garbled ACKNAKs

Wait for 0 from below

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt21 discussion

Sender seq added to pkt two seq rsquos (01)

will suffice Why must check if

received ACKNAK corrupted

twice as many states state must

ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver must check if

received packet is duplicate state indicates

whether 0 or 1 is expected pkt seq

note receiver can not know if its last ACKNAK received OK at sender

rdt22 a NAK-free protocol

same functionality as rdt21 using ACKs only instead of NAK receiver sends ACK for last pkt

received OK receiver must explicitly include seq of pkt being

ACKed

duplicate ACK at sender results in same action as NAK retransmit current pkt

rdt22 sender receiver fragments

rdt22 sender receiver fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

receiver FSMfragment

What happens if a pkt is duplicated

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs) checksum seq

ACKs retransmissions will be of help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

retransmits if no ACK received in this time

if pkt (or ACK) just delayed (not lost) retransmission will be

duplicate but use of seq rsquos already handles this

receiver must specify seq of pkt being ACKed

requires countdown timer

rdt30 sender

rdt30 sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

rdt_rcv(rcvpkt)

resend pkt1

rdt30 in action

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1

rec ack0

rec pkt1

send ack1rec ack1

send pkt1

rec pkt1

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1rec ack0

send pkt2

rec pkt2

TO

rec pkt1send ack1

rec ack1

rdt30 in action

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1rec pkt1

TO

send ack1

send pkt2rec ack1

send ack1

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1

rec pkt1

TO

send ack1

send pkt

rec ack1

send ack1

send pkt2

rec pkt2rec ack1

send no pkt (dupACK) send ack2rec ack2

send pkt2

Performance of rdt30 rdt30 works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit packet and 100bit ACK

What is the total delaybull Data transmission delay

ndash 8000109 = 810-6

bull ACK Transmission delayndash 100109 = 10-7 sec

bull Total Delay ndash 215ms + 008 + 0001=300081ms

Utilization Time transmitting total time 008 300081 = 000027

This is one pkt every 30msec or 33 kBsec over a 1 Gbps link

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Pipelining Protocols

Go-back-N big picture Sender can have up

to N unacked packets in pipeline

Rcvr only sends cumulative acks Doesnrsquot ack packet if

therersquos a gap Sender has timer for

oldest unacked packet If timer expires

retransmit all unacked packets

Selective Repeat big pic

Sender can have up to N unacked packets in pipeline

Rcvr acks individual packets

Sender maintains timer for each unacked packet When timer expires

retransmit only unack packet

Selective repeat big picture

Sender can have up to N unacked packets in pipeline

Receiver acks individual packets Sender maintains timer for each

unacked packet When timer expires retransmit only unack

packet

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

Go-Back-NPkt that could be sent unACKed pkt

Unused pktACKed pkt

send pkt

window

1 unACKed pkts

startwindowN=12

0 unACKed pkts

pkts

send pkts

windowN unACKed pkts

Next pkt to be sent

ACK arrives

windowN=12

Send pkt

N unACKed pkts

windowN-1 unACKed pkts

Sliding window

State of pkts

Go-Back-N Pkt that could be sent unACKed pkt

Unused pktACKed pkt

window

N unACKed pkts

window

N-1 unACKed pkts

ACK arrives

Send pkt

N unACKed pkts

window

N unACKed pkts

windowNo ACK arrives hellip timeout

0 unACKed pkts

window

GBN sender extended Activity Diagram

GBN Receiver Activity Diagram

GBN sender extended Activity DiagramWaiting for file

Set NSet NextPktToSend=0

Set LastACKed=-1

NextPktToSend ndash LastACKedltN

Send pkt[NextPktToSend] with SeqNum= NextPktToSend NextPktToSend++

Set Timer(NextPktToSend) = Now + TO

Wait

Clear Timers(LastACKed+1 to AN)LastACKed = AN

ACK arrived with ACKNum = AN

otherwise

Clear Timers(LastACKed+1 to NextPktToSend-1)NextPktToSend = LastACKed+1

Timer expires

GBN Receiver Activity Diagramstart

Set NextPktToRec = 0Clear ReceiverBufferClear ReceivedPktsReceiverBase = 0

ReceivedPkts[NextPktToRec] == 1

NextPktToRec++Give ReceiverBuffer[NextPktToRec] to app

Send ACK with ACKNum = NextPktToRec - 1

Place Pkt in ReceiverBuffer[SeqNum]ReceivedPkts[SeqNum]=1

wait

otherwise

GBN sender extended FSM

Wait start_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

GBN receiver extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

GBN in Action

sender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK=0Rec 1 give to app and Send ACK=1Rec 2 give to app and Send ACK=2

Rec 3 give to app and Send ACK=3

Rec 4 give to app and Send ACK=4

Rec 5 give to app and Send ACK=5

Rec 7 discard and Send ACK=5

Rec 8 discard and Send ACK=5

Rec 9 discard and Send ACK=5

Rec 10 discard and Send ACK=5

Rec 11 discard and Send ACK=5

Rec 12 discard and Send ACK=5

Rec 13 discard and Send ACK=5

Rec 6 give to app and Send ACK=6

Rec 7 give to app and Send ACK=7Rec 8 give to app and Send ACK=8

Rec 9 give to app and Send ACK=9

Optimal size of N in GBN

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order

delivery to upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart

timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed

pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]

send ACK(n) out-of-order buffer in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

Selective repeat in action

Summary of transport layer tools used so far

ACK and NACK Sequence numbers (and no NACK) Time out Sliding window

Optimal size = bandwidth delay product (if no other flows are using the network)

Cumulative ACK Buffer at the receiver is optional

Selective ACK Requires buffering at the receiver

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size

connection-oriented handshaking (exchange

of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver

reliable in-order byte steam

Pipelined and time-varying window size TCP congestion and

flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

It can be used as a pointer for placing the received data in the receiver buffer

ACKs seq of next byte

expected from other side

cumulative ACK

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Reliable data transfer getting startedWersquoll incrementally develop sender receiver

sides of reliable data transfer protocol (rdt) consider only unidirectional data transfer

but control info will flow on both directions

use finite state machines (FSM) to specify sender receiver

state1

state2

event causing state transitionactions taken on state transition

state when in this ldquostaterdquo next state

uniquely determined by next event

eventactions

Rdt10 reliable transfer over a reliable channel

underlying channel perfectly reliable no bit errors no loss of packets

separate FSMs for sender receiver sender sends data into underlying channel receiver read data from underlying channel

Rdt10 reliable transfer over a reliable channel

underlying channel perfectly reliable no bit errors no loss of packets

separate FSMs for sender receiver sender sends data into underlying channel receiver read data from underlying channel

packet = make_pkt(data)udt_send(packet)

Wait for call from above

rdt_send(data)

sender

extract (packetdata)deliver_data(data)

Wait for call from

below

rdt_rcv(packet)

receiver

Rdt20 channel with bit errors

underlying channel may flip bits in packets checksum to detect bit errors

the question how to recover from errors acknowledgements (ACKs) receiver explicitly tells

sender that pkt received OK negative acknowledgements (NAKs) receiver

explicitly tells sender that pkt had errors sender retransmits pkt on receipt of NAK

new mechanisms in rdt20 (beyond rdt10) error detection receiver feedback control msgs (ACKNAK) rcvr-

gtsender

rdt20 FSM specification

rdt20 FSM specification

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

belowsender

receiverrdt_send(data)

rdt20 has a fatal flaw

What happens if ACKNAK corrupted

sender doesnrsquot know what happened at receiver

canrsquot just retransmit possible duplicate

Handling duplicates sender retransmits current

pkt if ACKNAK garbled sender adds sequence

number to each pkt receiver discards (doesnrsquot

deliver up) duplicate pkt

Sender sends one packet then waits for receiver response

stop and wait

rdt21 sender handles garbled ACKNAKs

rdt21 receiver handles garbled ACKNAKs

rdt21 sender handles garbled ACKNAKs

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

Wait for ACK or NAK 0 udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

Wait for call 1 from

above

Wait for ACK or NAK 1

rdt21 receiver handles garbled ACKNAKs

Wait for 0 from below

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt21 discussion

Sender seq added to pkt two seq rsquos (01)

will suffice Why must check if

received ACKNAK corrupted

twice as many states state must

ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver must check if

received packet is duplicate state indicates

whether 0 or 1 is expected pkt seq

note receiver can not know if its last ACKNAK received OK at sender

rdt22 a NAK-free protocol

same functionality as rdt21 using ACKs only instead of NAK receiver sends ACK for last pkt

received OK receiver must explicitly include seq of pkt being

ACKed

duplicate ACK at sender results in same action as NAK retransmit current pkt

rdt22 sender receiver fragments

rdt22 sender receiver fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

receiver FSMfragment

What happens if a pkt is duplicated

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs) checksum seq

ACKs retransmissions will be of help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

retransmits if no ACK received in this time

if pkt (or ACK) just delayed (not lost) retransmission will be

duplicate but use of seq rsquos already handles this

receiver must specify seq of pkt being ACKed

requires countdown timer

rdt30 sender

rdt30 sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

rdt_rcv(rcvpkt)

resend pkt1

rdt30 in action

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1

rec ack0

rec pkt1

send ack1rec ack1

send pkt1

rec pkt1

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1rec ack0

send pkt2

rec pkt2

TO

rec pkt1send ack1

rec ack1

rdt30 in action

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1rec pkt1

TO

send ack1

send pkt2rec ack1

send ack1

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1

rec pkt1

TO

send ack1

send pkt

rec ack1

send ack1

send pkt2

rec pkt2rec ack1

send no pkt (dupACK) send ack2rec ack2

send pkt2

Performance of rdt30 rdt30 works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit packet and 100bit ACK

What is the total delaybull Data transmission delay

ndash 8000109 = 810-6

bull ACK Transmission delayndash 100109 = 10-7 sec

bull Total Delay ndash 215ms + 008 + 0001=300081ms

Utilization Time transmitting total time 008 300081 = 000027

This is one pkt every 30msec or 33 kBsec over a 1 Gbps link

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Pipelining Protocols

Go-back-N big picture Sender can have up

to N unacked packets in pipeline

Rcvr only sends cumulative acks Doesnrsquot ack packet if

therersquos a gap Sender has timer for

oldest unacked packet If timer expires

retransmit all unacked packets

Selective Repeat big pic

Sender can have up to N unacked packets in pipeline

Rcvr acks individual packets

Sender maintains timer for each unacked packet When timer expires

retransmit only unack packet

Selective repeat big picture

Sender can have up to N unacked packets in pipeline

Receiver acks individual packets Sender maintains timer for each

unacked packet When timer expires retransmit only unack

packet

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

Go-Back-NPkt that could be sent unACKed pkt

Unused pktACKed pkt

send pkt

window

1 unACKed pkts

startwindowN=12

0 unACKed pkts

pkts

send pkts

windowN unACKed pkts

Next pkt to be sent

ACK arrives

windowN=12

Send pkt

N unACKed pkts

windowN-1 unACKed pkts

Sliding window

State of pkts

Go-Back-N Pkt that could be sent unACKed pkt

Unused pktACKed pkt

window

N unACKed pkts

window

N-1 unACKed pkts

ACK arrives

Send pkt

N unACKed pkts

window

N unACKed pkts

windowNo ACK arrives hellip timeout

0 unACKed pkts

window

GBN sender extended Activity Diagram

GBN Receiver Activity Diagram

GBN sender extended Activity DiagramWaiting for file

Set NSet NextPktToSend=0

Set LastACKed=-1

NextPktToSend ndash LastACKedltN

Send pkt[NextPktToSend] with SeqNum= NextPktToSend NextPktToSend++

Set Timer(NextPktToSend) = Now + TO

Wait

Clear Timers(LastACKed+1 to AN)LastACKed = AN

ACK arrived with ACKNum = AN

otherwise

Clear Timers(LastACKed+1 to NextPktToSend-1)NextPktToSend = LastACKed+1

Timer expires

GBN Receiver Activity Diagramstart

Set NextPktToRec = 0Clear ReceiverBufferClear ReceivedPktsReceiverBase = 0

ReceivedPkts[NextPktToRec] == 1

NextPktToRec++Give ReceiverBuffer[NextPktToRec] to app

Send ACK with ACKNum = NextPktToRec - 1

Place Pkt in ReceiverBuffer[SeqNum]ReceivedPkts[SeqNum]=1

wait

otherwise

GBN sender extended FSM

Wait start_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

GBN receiver extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

GBN in Action

sender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK=0Rec 1 give to app and Send ACK=1Rec 2 give to app and Send ACK=2

Rec 3 give to app and Send ACK=3

Rec 4 give to app and Send ACK=4

Rec 5 give to app and Send ACK=5

Rec 7 discard and Send ACK=5

Rec 8 discard and Send ACK=5

Rec 9 discard and Send ACK=5

Rec 10 discard and Send ACK=5

Rec 11 discard and Send ACK=5

Rec 12 discard and Send ACK=5

Rec 13 discard and Send ACK=5

Rec 6 give to app and Send ACK=6

Rec 7 give to app and Send ACK=7Rec 8 give to app and Send ACK=8

Rec 9 give to app and Send ACK=9

Optimal size of N in GBN

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order

delivery to upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart

timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed

pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]

send ACK(n) out-of-order buffer in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

Selective repeat in action

Summary of transport layer tools used so far

ACK and NACK Sequence numbers (and no NACK) Time out Sliding window

Optimal size = bandwidth delay product (if no other flows are using the network)

Cumulative ACK Buffer at the receiver is optional

Selective ACK Requires buffering at the receiver

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size

connection-oriented handshaking (exchange

of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver

reliable in-order byte steam

Pipelined and time-varying window size TCP congestion and

flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

It can be used as a pointer for placing the received data in the receiver buffer

ACKs seq of next byte

expected from other side

cumulative ACK

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Rdt10 reliable transfer over a reliable channel

underlying channel perfectly reliable no bit errors no loss of packets

separate FSMs for sender receiver sender sends data into underlying channel receiver read data from underlying channel

Rdt10 reliable transfer over a reliable channel

underlying channel perfectly reliable no bit errors no loss of packets

separate FSMs for sender receiver sender sends data into underlying channel receiver read data from underlying channel

packet = make_pkt(data)udt_send(packet)

Wait for call from above

rdt_send(data)

sender

extract (packetdata)deliver_data(data)

Wait for call from

below

rdt_rcv(packet)

receiver

Rdt20 channel with bit errors

underlying channel may flip bits in packets checksum to detect bit errors

the question how to recover from errors acknowledgements (ACKs) receiver explicitly tells

sender that pkt received OK negative acknowledgements (NAKs) receiver

explicitly tells sender that pkt had errors sender retransmits pkt on receipt of NAK

new mechanisms in rdt20 (beyond rdt10) error detection receiver feedback control msgs (ACKNAK) rcvr-

gtsender

rdt20 FSM specification

rdt20 FSM specification

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

belowsender

receiverrdt_send(data)

rdt20 has a fatal flaw

What happens if ACKNAK corrupted

sender doesnrsquot know what happened at receiver

canrsquot just retransmit possible duplicate

Handling duplicates sender retransmits current

pkt if ACKNAK garbled sender adds sequence

number to each pkt receiver discards (doesnrsquot

deliver up) duplicate pkt

Sender sends one packet then waits for receiver response

stop and wait

rdt21 sender handles garbled ACKNAKs

rdt21 receiver handles garbled ACKNAKs

rdt21 sender handles garbled ACKNAKs

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

Wait for ACK or NAK 0 udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

Wait for call 1 from

above

Wait for ACK or NAK 1

rdt21 receiver handles garbled ACKNAKs

Wait for 0 from below

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt21 discussion

Sender seq added to pkt two seq rsquos (01)

will suffice Why must check if

received ACKNAK corrupted

twice as many states state must

ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver must check if

received packet is duplicate state indicates

whether 0 or 1 is expected pkt seq

note receiver can not know if its last ACKNAK received OK at sender

rdt22 a NAK-free protocol

same functionality as rdt21 using ACKs only instead of NAK receiver sends ACK for last pkt

received OK receiver must explicitly include seq of pkt being

ACKed

duplicate ACK at sender results in same action as NAK retransmit current pkt

rdt22 sender receiver fragments

rdt22 sender receiver fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

receiver FSMfragment

What happens if a pkt is duplicated

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs) checksum seq

ACKs retransmissions will be of help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

retransmits if no ACK received in this time

if pkt (or ACK) just delayed (not lost) retransmission will be

duplicate but use of seq rsquos already handles this

receiver must specify seq of pkt being ACKed

requires countdown timer

rdt30 sender

rdt30 sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

rdt_rcv(rcvpkt)

resend pkt1

rdt30 in action

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1

rec ack0

rec pkt1

send ack1rec ack1

send pkt1

rec pkt1

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1rec ack0

send pkt2

rec pkt2

TO

rec pkt1send ack1

rec ack1

rdt30 in action

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1rec pkt1

TO

send ack1

send pkt2rec ack1

send ack1

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1

rec pkt1

TO

send ack1

send pkt

rec ack1

send ack1

send pkt2

rec pkt2rec ack1

send no pkt (dupACK) send ack2rec ack2

send pkt2

Performance of rdt30 rdt30 works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit packet and 100bit ACK

What is the total delaybull Data transmission delay

ndash 8000109 = 810-6

bull ACK Transmission delayndash 100109 = 10-7 sec

bull Total Delay ndash 215ms + 008 + 0001=300081ms

Utilization Time transmitting total time 008 300081 = 000027

This is one pkt every 30msec or 33 kBsec over a 1 Gbps link

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Pipelining Protocols

Go-back-N big picture Sender can have up

to N unacked packets in pipeline

Rcvr only sends cumulative acks Doesnrsquot ack packet if

therersquos a gap Sender has timer for

oldest unacked packet If timer expires

retransmit all unacked packets

Selective Repeat big pic

Sender can have up to N unacked packets in pipeline

Rcvr acks individual packets

Sender maintains timer for each unacked packet When timer expires

retransmit only unack packet

Selective repeat big picture

Sender can have up to N unacked packets in pipeline

Receiver acks individual packets Sender maintains timer for each

unacked packet When timer expires retransmit only unack

packet

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

Go-Back-NPkt that could be sent unACKed pkt

Unused pktACKed pkt

send pkt

window

1 unACKed pkts

startwindowN=12

0 unACKed pkts

pkts

send pkts

windowN unACKed pkts

Next pkt to be sent

ACK arrives

windowN=12

Send pkt

N unACKed pkts

windowN-1 unACKed pkts

Sliding window

State of pkts

Go-Back-N Pkt that could be sent unACKed pkt

Unused pktACKed pkt

window

N unACKed pkts

window

N-1 unACKed pkts

ACK arrives

Send pkt

N unACKed pkts

window

N unACKed pkts

windowNo ACK arrives hellip timeout

0 unACKed pkts

window

GBN sender extended Activity Diagram

GBN Receiver Activity Diagram

GBN sender extended Activity DiagramWaiting for file

Set NSet NextPktToSend=0

Set LastACKed=-1

NextPktToSend ndash LastACKedltN

Send pkt[NextPktToSend] with SeqNum= NextPktToSend NextPktToSend++

Set Timer(NextPktToSend) = Now + TO

Wait

Clear Timers(LastACKed+1 to AN)LastACKed = AN

ACK arrived with ACKNum = AN

otherwise

Clear Timers(LastACKed+1 to NextPktToSend-1)NextPktToSend = LastACKed+1

Timer expires

GBN Receiver Activity Diagramstart

Set NextPktToRec = 0Clear ReceiverBufferClear ReceivedPktsReceiverBase = 0

ReceivedPkts[NextPktToRec] == 1

NextPktToRec++Give ReceiverBuffer[NextPktToRec] to app

Send ACK with ACKNum = NextPktToRec - 1

Place Pkt in ReceiverBuffer[SeqNum]ReceivedPkts[SeqNum]=1

wait

otherwise

GBN sender extended FSM

Wait start_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

GBN receiver extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

GBN in Action

sender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK=0Rec 1 give to app and Send ACK=1Rec 2 give to app and Send ACK=2

Rec 3 give to app and Send ACK=3

Rec 4 give to app and Send ACK=4

Rec 5 give to app and Send ACK=5

Rec 7 discard and Send ACK=5

Rec 8 discard and Send ACK=5

Rec 9 discard and Send ACK=5

Rec 10 discard and Send ACK=5

Rec 11 discard and Send ACK=5

Rec 12 discard and Send ACK=5

Rec 13 discard and Send ACK=5

Rec 6 give to app and Send ACK=6

Rec 7 give to app and Send ACK=7Rec 8 give to app and Send ACK=8

Rec 9 give to app and Send ACK=9

Optimal size of N in GBN

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order

delivery to upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart

timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed

pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]

send ACK(n) out-of-order buffer in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

Selective repeat in action

Summary of transport layer tools used so far

ACK and NACK Sequence numbers (and no NACK) Time out Sliding window

Optimal size = bandwidth delay product (if no other flows are using the network)

Cumulative ACK Buffer at the receiver is optional

Selective ACK Requires buffering at the receiver

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size

connection-oriented handshaking (exchange

of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver

reliable in-order byte steam

Pipelined and time-varying window size TCP congestion and

flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

It can be used as a pointer for placing the received data in the receiver buffer

ACKs seq of next byte

expected from other side

cumulative ACK

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Rdt10 reliable transfer over a reliable channel

underlying channel perfectly reliable no bit errors no loss of packets

separate FSMs for sender receiver sender sends data into underlying channel receiver read data from underlying channel

packet = make_pkt(data)udt_send(packet)

Wait for call from above

rdt_send(data)

sender

extract (packetdata)deliver_data(data)

Wait for call from

below

rdt_rcv(packet)

receiver

Rdt20 channel with bit errors

underlying channel may flip bits in packets checksum to detect bit errors

the question how to recover from errors acknowledgements (ACKs) receiver explicitly tells

sender that pkt received OK negative acknowledgements (NAKs) receiver

explicitly tells sender that pkt had errors sender retransmits pkt on receipt of NAK

new mechanisms in rdt20 (beyond rdt10) error detection receiver feedback control msgs (ACKNAK) rcvr-

gtsender

rdt20 FSM specification

rdt20 FSM specification

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

belowsender

receiverrdt_send(data)

rdt20 has a fatal flaw

What happens if ACKNAK corrupted

sender doesnrsquot know what happened at receiver

canrsquot just retransmit possible duplicate

Handling duplicates sender retransmits current

pkt if ACKNAK garbled sender adds sequence

number to each pkt receiver discards (doesnrsquot

deliver up) duplicate pkt

Sender sends one packet then waits for receiver response

stop and wait

rdt21 sender handles garbled ACKNAKs

rdt21 receiver handles garbled ACKNAKs

rdt21 sender handles garbled ACKNAKs

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

Wait for ACK or NAK 0 udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

Wait for call 1 from

above

Wait for ACK or NAK 1

rdt21 receiver handles garbled ACKNAKs

Wait for 0 from below

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt21 discussion

Sender seq added to pkt two seq rsquos (01)

will suffice Why must check if

received ACKNAK corrupted

twice as many states state must

ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver must check if

received packet is duplicate state indicates

whether 0 or 1 is expected pkt seq

note receiver can not know if its last ACKNAK received OK at sender

rdt22 a NAK-free protocol

same functionality as rdt21 using ACKs only instead of NAK receiver sends ACK for last pkt

received OK receiver must explicitly include seq of pkt being

ACKed

duplicate ACK at sender results in same action as NAK retransmit current pkt

rdt22 sender receiver fragments

rdt22 sender receiver fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

receiver FSMfragment

What happens if a pkt is duplicated

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs) checksum seq

ACKs retransmissions will be of help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

retransmits if no ACK received in this time

if pkt (or ACK) just delayed (not lost) retransmission will be

duplicate but use of seq rsquos already handles this

receiver must specify seq of pkt being ACKed

requires countdown timer

rdt30 sender

rdt30 sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

rdt_rcv(rcvpkt)

resend pkt1

rdt30 in action

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1

rec ack0

rec pkt1

send ack1rec ack1

send pkt1

rec pkt1

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1rec ack0

send pkt2

rec pkt2

TO

rec pkt1send ack1

rec ack1

rdt30 in action

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1rec pkt1

TO

send ack1

send pkt2rec ack1

send ack1

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1

rec pkt1

TO

send ack1

send pkt

rec ack1

send ack1

send pkt2

rec pkt2rec ack1

send no pkt (dupACK) send ack2rec ack2

send pkt2

Performance of rdt30 rdt30 works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit packet and 100bit ACK

What is the total delaybull Data transmission delay

ndash 8000109 = 810-6

bull ACK Transmission delayndash 100109 = 10-7 sec

bull Total Delay ndash 215ms + 008 + 0001=300081ms

Utilization Time transmitting total time 008 300081 = 000027

This is one pkt every 30msec or 33 kBsec over a 1 Gbps link

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Pipelining Protocols

Go-back-N big picture Sender can have up

to N unacked packets in pipeline

Rcvr only sends cumulative acks Doesnrsquot ack packet if

therersquos a gap Sender has timer for

oldest unacked packet If timer expires

retransmit all unacked packets

Selective Repeat big pic

Sender can have up to N unacked packets in pipeline

Rcvr acks individual packets

Sender maintains timer for each unacked packet When timer expires

retransmit only unack packet

Selective repeat big picture

Sender can have up to N unacked packets in pipeline

Receiver acks individual packets Sender maintains timer for each

unacked packet When timer expires retransmit only unack

packet

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

Go-Back-NPkt that could be sent unACKed pkt

Unused pktACKed pkt

send pkt

window

1 unACKed pkts

startwindowN=12

0 unACKed pkts

pkts

send pkts

windowN unACKed pkts

Next pkt to be sent

ACK arrives

windowN=12

Send pkt

N unACKed pkts

windowN-1 unACKed pkts

Sliding window

State of pkts

Go-Back-N Pkt that could be sent unACKed pkt

Unused pktACKed pkt

window

N unACKed pkts

window

N-1 unACKed pkts

ACK arrives

Send pkt

N unACKed pkts

window

N unACKed pkts

windowNo ACK arrives hellip timeout

0 unACKed pkts

window

GBN sender extended Activity Diagram

GBN Receiver Activity Diagram

GBN sender extended Activity DiagramWaiting for file

Set NSet NextPktToSend=0

Set LastACKed=-1

NextPktToSend ndash LastACKedltN

Send pkt[NextPktToSend] with SeqNum= NextPktToSend NextPktToSend++

Set Timer(NextPktToSend) = Now + TO

Wait

Clear Timers(LastACKed+1 to AN)LastACKed = AN

ACK arrived with ACKNum = AN

otherwise

Clear Timers(LastACKed+1 to NextPktToSend-1)NextPktToSend = LastACKed+1

Timer expires

GBN Receiver Activity Diagramstart

Set NextPktToRec = 0Clear ReceiverBufferClear ReceivedPktsReceiverBase = 0

ReceivedPkts[NextPktToRec] == 1

NextPktToRec++Give ReceiverBuffer[NextPktToRec] to app

Send ACK with ACKNum = NextPktToRec - 1

Place Pkt in ReceiverBuffer[SeqNum]ReceivedPkts[SeqNum]=1

wait

otherwise

GBN sender extended FSM

Wait start_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

GBN receiver extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

GBN in Action

sender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK=0Rec 1 give to app and Send ACK=1Rec 2 give to app and Send ACK=2

Rec 3 give to app and Send ACK=3

Rec 4 give to app and Send ACK=4

Rec 5 give to app and Send ACK=5

Rec 7 discard and Send ACK=5

Rec 8 discard and Send ACK=5

Rec 9 discard and Send ACK=5

Rec 10 discard and Send ACK=5

Rec 11 discard and Send ACK=5

Rec 12 discard and Send ACK=5

Rec 13 discard and Send ACK=5

Rec 6 give to app and Send ACK=6

Rec 7 give to app and Send ACK=7Rec 8 give to app and Send ACK=8

Rec 9 give to app and Send ACK=9

Optimal size of N in GBN

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order

delivery to upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart

timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed

pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]

send ACK(n) out-of-order buffer in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

Selective repeat in action

Summary of transport layer tools used so far

ACK and NACK Sequence numbers (and no NACK) Time out Sliding window

Optimal size = bandwidth delay product (if no other flows are using the network)

Cumulative ACK Buffer at the receiver is optional

Selective ACK Requires buffering at the receiver

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size

connection-oriented handshaking (exchange

of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver

reliable in-order byte steam

Pipelined and time-varying window size TCP congestion and

flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

It can be used as a pointer for placing the received data in the receiver buffer

ACKs seq of next byte

expected from other side

cumulative ACK

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Rdt20 channel with bit errors

underlying channel may flip bits in packets checksum to detect bit errors

the question how to recover from errors acknowledgements (ACKs) receiver explicitly tells

sender that pkt received OK negative acknowledgements (NAKs) receiver

explicitly tells sender that pkt had errors sender retransmits pkt on receipt of NAK

new mechanisms in rdt20 (beyond rdt10) error detection receiver feedback control msgs (ACKNAK) rcvr-

gtsender

rdt20 FSM specification

rdt20 FSM specification

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

belowsender

receiverrdt_send(data)

rdt20 has a fatal flaw

What happens if ACKNAK corrupted

sender doesnrsquot know what happened at receiver

canrsquot just retransmit possible duplicate

Handling duplicates sender retransmits current

pkt if ACKNAK garbled sender adds sequence

number to each pkt receiver discards (doesnrsquot

deliver up) duplicate pkt

Sender sends one packet then waits for receiver response

stop and wait

rdt21 sender handles garbled ACKNAKs

rdt21 receiver handles garbled ACKNAKs

rdt21 sender handles garbled ACKNAKs

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

Wait for ACK or NAK 0 udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

Wait for call 1 from

above

Wait for ACK or NAK 1

rdt21 receiver handles garbled ACKNAKs

Wait for 0 from below

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt21 discussion

Sender seq added to pkt two seq rsquos (01)

will suffice Why must check if

received ACKNAK corrupted

twice as many states state must

ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver must check if

received packet is duplicate state indicates

whether 0 or 1 is expected pkt seq

note receiver can not know if its last ACKNAK received OK at sender

rdt22 a NAK-free protocol

same functionality as rdt21 using ACKs only instead of NAK receiver sends ACK for last pkt

received OK receiver must explicitly include seq of pkt being

ACKed

duplicate ACK at sender results in same action as NAK retransmit current pkt

rdt22 sender receiver fragments

rdt22 sender receiver fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

receiver FSMfragment

What happens if a pkt is duplicated

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs) checksum seq

ACKs retransmissions will be of help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

retransmits if no ACK received in this time

if pkt (or ACK) just delayed (not lost) retransmission will be

duplicate but use of seq rsquos already handles this

receiver must specify seq of pkt being ACKed

requires countdown timer

rdt30 sender

rdt30 sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

rdt_rcv(rcvpkt)

resend pkt1

rdt30 in action

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1

rec ack0

rec pkt1

send ack1rec ack1

send pkt1

rec pkt1

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1rec ack0

send pkt2

rec pkt2

TO

rec pkt1send ack1

rec ack1

rdt30 in action

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1rec pkt1

TO

send ack1

send pkt2rec ack1

send ack1

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1

rec pkt1

TO

send ack1

send pkt

rec ack1

send ack1

send pkt2

rec pkt2rec ack1

send no pkt (dupACK) send ack2rec ack2

send pkt2

Performance of rdt30 rdt30 works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit packet and 100bit ACK

What is the total delaybull Data transmission delay

ndash 8000109 = 810-6

bull ACK Transmission delayndash 100109 = 10-7 sec

bull Total Delay ndash 215ms + 008 + 0001=300081ms

Utilization Time transmitting total time 008 300081 = 000027

This is one pkt every 30msec or 33 kBsec over a 1 Gbps link

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Pipelining Protocols

Go-back-N big picture Sender can have up

to N unacked packets in pipeline

Rcvr only sends cumulative acks Doesnrsquot ack packet if

therersquos a gap Sender has timer for

oldest unacked packet If timer expires

retransmit all unacked packets

Selective Repeat big pic

Sender can have up to N unacked packets in pipeline

Rcvr acks individual packets

Sender maintains timer for each unacked packet When timer expires

retransmit only unack packet

Selective repeat big picture

Sender can have up to N unacked packets in pipeline

Receiver acks individual packets Sender maintains timer for each

unacked packet When timer expires retransmit only unack

packet

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

Go-Back-NPkt that could be sent unACKed pkt

Unused pktACKed pkt

send pkt

window

1 unACKed pkts

startwindowN=12

0 unACKed pkts

pkts

send pkts

windowN unACKed pkts

Next pkt to be sent

ACK arrives

windowN=12

Send pkt

N unACKed pkts

windowN-1 unACKed pkts

Sliding window

State of pkts

Go-Back-N Pkt that could be sent unACKed pkt

Unused pktACKed pkt

window

N unACKed pkts

window

N-1 unACKed pkts

ACK arrives

Send pkt

N unACKed pkts

window

N unACKed pkts

windowNo ACK arrives hellip timeout

0 unACKed pkts

window

GBN sender extended Activity Diagram

GBN Receiver Activity Diagram

GBN sender extended Activity DiagramWaiting for file

Set NSet NextPktToSend=0

Set LastACKed=-1

NextPktToSend ndash LastACKedltN

Send pkt[NextPktToSend] with SeqNum= NextPktToSend NextPktToSend++

Set Timer(NextPktToSend) = Now + TO

Wait

Clear Timers(LastACKed+1 to AN)LastACKed = AN

ACK arrived with ACKNum = AN

otherwise

Clear Timers(LastACKed+1 to NextPktToSend-1)NextPktToSend = LastACKed+1

Timer expires

GBN Receiver Activity Diagramstart

Set NextPktToRec = 0Clear ReceiverBufferClear ReceivedPktsReceiverBase = 0

ReceivedPkts[NextPktToRec] == 1

NextPktToRec++Give ReceiverBuffer[NextPktToRec] to app

Send ACK with ACKNum = NextPktToRec - 1

Place Pkt in ReceiverBuffer[SeqNum]ReceivedPkts[SeqNum]=1

wait

otherwise

GBN sender extended FSM

Wait start_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

GBN receiver extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

GBN in Action

sender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK=0Rec 1 give to app and Send ACK=1Rec 2 give to app and Send ACK=2

Rec 3 give to app and Send ACK=3

Rec 4 give to app and Send ACK=4

Rec 5 give to app and Send ACK=5

Rec 7 discard and Send ACK=5

Rec 8 discard and Send ACK=5

Rec 9 discard and Send ACK=5

Rec 10 discard and Send ACK=5

Rec 11 discard and Send ACK=5

Rec 12 discard and Send ACK=5

Rec 13 discard and Send ACK=5

Rec 6 give to app and Send ACK=6

Rec 7 give to app and Send ACK=7Rec 8 give to app and Send ACK=8

Rec 9 give to app and Send ACK=9

Optimal size of N in GBN

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order

delivery to upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart

timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed

pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]

send ACK(n) out-of-order buffer in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

Selective repeat in action

Summary of transport layer tools used so far

ACK and NACK Sequence numbers (and no NACK) Time out Sliding window

Optimal size = bandwidth delay product (if no other flows are using the network)

Cumulative ACK Buffer at the receiver is optional

Selective ACK Requires buffering at the receiver

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size

connection-oriented handshaking (exchange

of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver

reliable in-order byte steam

Pipelined and time-varying window size TCP congestion and

flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

It can be used as a pointer for placing the received data in the receiver buffer

ACKs seq of next byte

expected from other side

cumulative ACK

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

rdt20 FSM specification

rdt20 FSM specification

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

belowsender

receiverrdt_send(data)

rdt20 has a fatal flaw

What happens if ACKNAK corrupted

sender doesnrsquot know what happened at receiver

canrsquot just retransmit possible duplicate

Handling duplicates sender retransmits current

pkt if ACKNAK garbled sender adds sequence

number to each pkt receiver discards (doesnrsquot

deliver up) duplicate pkt

Sender sends one packet then waits for receiver response

stop and wait

rdt21 sender handles garbled ACKNAKs

rdt21 receiver handles garbled ACKNAKs

rdt21 sender handles garbled ACKNAKs

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

Wait for ACK or NAK 0 udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

Wait for call 1 from

above

Wait for ACK or NAK 1

rdt21 receiver handles garbled ACKNAKs

Wait for 0 from below

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt21 discussion

Sender seq added to pkt two seq rsquos (01)

will suffice Why must check if

received ACKNAK corrupted

twice as many states state must

ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver must check if

received packet is duplicate state indicates

whether 0 or 1 is expected pkt seq

note receiver can not know if its last ACKNAK received OK at sender

rdt22 a NAK-free protocol

same functionality as rdt21 using ACKs only instead of NAK receiver sends ACK for last pkt

received OK receiver must explicitly include seq of pkt being

ACKed

duplicate ACK at sender results in same action as NAK retransmit current pkt

rdt22 sender receiver fragments

rdt22 sender receiver fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

receiver FSMfragment

What happens if a pkt is duplicated

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs) checksum seq

ACKs retransmissions will be of help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

retransmits if no ACK received in this time

if pkt (or ACK) just delayed (not lost) retransmission will be

duplicate but use of seq rsquos already handles this

receiver must specify seq of pkt being ACKed

requires countdown timer

rdt30 sender

rdt30 sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

rdt_rcv(rcvpkt)

resend pkt1

rdt30 in action

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1

rec ack0

rec pkt1

send ack1rec ack1

send pkt1

rec pkt1

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1rec ack0

send pkt2

rec pkt2

TO

rec pkt1send ack1

rec ack1

rdt30 in action

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1rec pkt1

TO

send ack1

send pkt2rec ack1

send ack1

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1

rec pkt1

TO

send ack1

send pkt

rec ack1

send ack1

send pkt2

rec pkt2rec ack1

send no pkt (dupACK) send ack2rec ack2

send pkt2

Performance of rdt30 rdt30 works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit packet and 100bit ACK

What is the total delaybull Data transmission delay

ndash 8000109 = 810-6

bull ACK Transmission delayndash 100109 = 10-7 sec

bull Total Delay ndash 215ms + 008 + 0001=300081ms

Utilization Time transmitting total time 008 300081 = 000027

This is one pkt every 30msec or 33 kBsec over a 1 Gbps link

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Pipelining Protocols

Go-back-N big picture Sender can have up

to N unacked packets in pipeline

Rcvr only sends cumulative acks Doesnrsquot ack packet if

therersquos a gap Sender has timer for

oldest unacked packet If timer expires

retransmit all unacked packets

Selective Repeat big pic

Sender can have up to N unacked packets in pipeline

Rcvr acks individual packets

Sender maintains timer for each unacked packet When timer expires

retransmit only unack packet

Selective repeat big picture

Sender can have up to N unacked packets in pipeline

Receiver acks individual packets Sender maintains timer for each

unacked packet When timer expires retransmit only unack

packet

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

Go-Back-NPkt that could be sent unACKed pkt

Unused pktACKed pkt

send pkt

window

1 unACKed pkts

startwindowN=12

0 unACKed pkts

pkts

send pkts

windowN unACKed pkts

Next pkt to be sent

ACK arrives

windowN=12

Send pkt

N unACKed pkts

windowN-1 unACKed pkts

Sliding window

State of pkts

Go-Back-N Pkt that could be sent unACKed pkt

Unused pktACKed pkt

window

N unACKed pkts

window

N-1 unACKed pkts

ACK arrives

Send pkt

N unACKed pkts

window

N unACKed pkts

windowNo ACK arrives hellip timeout

0 unACKed pkts

window

GBN sender extended Activity Diagram

GBN Receiver Activity Diagram

GBN sender extended Activity DiagramWaiting for file

Set NSet NextPktToSend=0

Set LastACKed=-1

NextPktToSend ndash LastACKedltN

Send pkt[NextPktToSend] with SeqNum= NextPktToSend NextPktToSend++

Set Timer(NextPktToSend) = Now + TO

Wait

Clear Timers(LastACKed+1 to AN)LastACKed = AN

ACK arrived with ACKNum = AN

otherwise

Clear Timers(LastACKed+1 to NextPktToSend-1)NextPktToSend = LastACKed+1

Timer expires

GBN Receiver Activity Diagramstart

Set NextPktToRec = 0Clear ReceiverBufferClear ReceivedPktsReceiverBase = 0

ReceivedPkts[NextPktToRec] == 1

NextPktToRec++Give ReceiverBuffer[NextPktToRec] to app

Send ACK with ACKNum = NextPktToRec - 1

Place Pkt in ReceiverBuffer[SeqNum]ReceivedPkts[SeqNum]=1

wait

otherwise

GBN sender extended FSM

Wait start_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

GBN receiver extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

GBN in Action

sender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK=0Rec 1 give to app and Send ACK=1Rec 2 give to app and Send ACK=2

Rec 3 give to app and Send ACK=3

Rec 4 give to app and Send ACK=4

Rec 5 give to app and Send ACK=5

Rec 7 discard and Send ACK=5

Rec 8 discard and Send ACK=5

Rec 9 discard and Send ACK=5

Rec 10 discard and Send ACK=5

Rec 11 discard and Send ACK=5

Rec 12 discard and Send ACK=5

Rec 13 discard and Send ACK=5

Rec 6 give to app and Send ACK=6

Rec 7 give to app and Send ACK=7Rec 8 give to app and Send ACK=8

Rec 9 give to app and Send ACK=9

Optimal size of N in GBN

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order

delivery to upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart

timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed

pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]

send ACK(n) out-of-order buffer in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

Selective repeat in action

Summary of transport layer tools used so far

ACK and NACK Sequence numbers (and no NACK) Time out Sliding window

Optimal size = bandwidth delay product (if no other flows are using the network)

Cumulative ACK Buffer at the receiver is optional

Selective ACK Requires buffering at the receiver

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size

connection-oriented handshaking (exchange

of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver

reliable in-order byte steam

Pipelined and time-varying window size TCP congestion and

flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

It can be used as a pointer for placing the received data in the receiver buffer

ACKs seq of next byte

expected from other side

cumulative ACK

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

rdt20 FSM specification

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

belowsender

receiverrdt_send(data)

rdt20 has a fatal flaw

What happens if ACKNAK corrupted

sender doesnrsquot know what happened at receiver

canrsquot just retransmit possible duplicate

Handling duplicates sender retransmits current

pkt if ACKNAK garbled sender adds sequence

number to each pkt receiver discards (doesnrsquot

deliver up) duplicate pkt

Sender sends one packet then waits for receiver response

stop and wait

rdt21 sender handles garbled ACKNAKs

rdt21 receiver handles garbled ACKNAKs

rdt21 sender handles garbled ACKNAKs

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

Wait for ACK or NAK 0 udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

Wait for call 1 from

above

Wait for ACK or NAK 1

rdt21 receiver handles garbled ACKNAKs

Wait for 0 from below

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt21 discussion

Sender seq added to pkt two seq rsquos (01)

will suffice Why must check if

received ACKNAK corrupted

twice as many states state must

ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver must check if

received packet is duplicate state indicates

whether 0 or 1 is expected pkt seq

note receiver can not know if its last ACKNAK received OK at sender

rdt22 a NAK-free protocol

same functionality as rdt21 using ACKs only instead of NAK receiver sends ACK for last pkt

received OK receiver must explicitly include seq of pkt being

ACKed

duplicate ACK at sender results in same action as NAK retransmit current pkt

rdt22 sender receiver fragments

rdt22 sender receiver fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

receiver FSMfragment

What happens if a pkt is duplicated

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs) checksum seq

ACKs retransmissions will be of help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

retransmits if no ACK received in this time

if pkt (or ACK) just delayed (not lost) retransmission will be

duplicate but use of seq rsquos already handles this

receiver must specify seq of pkt being ACKed

requires countdown timer

rdt30 sender

rdt30 sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

rdt_rcv(rcvpkt)

resend pkt1

rdt30 in action

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1

rec ack0

rec pkt1

send ack1rec ack1

send pkt1

rec pkt1

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1rec ack0

send pkt2

rec pkt2

TO

rec pkt1send ack1

rec ack1

rdt30 in action

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1rec pkt1

TO

send ack1

send pkt2rec ack1

send ack1

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1

rec pkt1

TO

send ack1

send pkt

rec ack1

send ack1

send pkt2

rec pkt2rec ack1

send no pkt (dupACK) send ack2rec ack2

send pkt2

Performance of rdt30 rdt30 works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit packet and 100bit ACK

What is the total delaybull Data transmission delay

ndash 8000109 = 810-6

bull ACK Transmission delayndash 100109 = 10-7 sec

bull Total Delay ndash 215ms + 008 + 0001=300081ms

Utilization Time transmitting total time 008 300081 = 000027

This is one pkt every 30msec or 33 kBsec over a 1 Gbps link

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Pipelining Protocols

Go-back-N big picture Sender can have up

to N unacked packets in pipeline

Rcvr only sends cumulative acks Doesnrsquot ack packet if

therersquos a gap Sender has timer for

oldest unacked packet If timer expires

retransmit all unacked packets

Selective Repeat big pic

Sender can have up to N unacked packets in pipeline

Rcvr acks individual packets

Sender maintains timer for each unacked packet When timer expires

retransmit only unack packet

Selective repeat big picture

Sender can have up to N unacked packets in pipeline

Receiver acks individual packets Sender maintains timer for each

unacked packet When timer expires retransmit only unack

packet

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

Go-Back-NPkt that could be sent unACKed pkt

Unused pktACKed pkt

send pkt

window

1 unACKed pkts

startwindowN=12

0 unACKed pkts

pkts

send pkts

windowN unACKed pkts

Next pkt to be sent

ACK arrives

windowN=12

Send pkt

N unACKed pkts

windowN-1 unACKed pkts

Sliding window

State of pkts

Go-Back-N Pkt that could be sent unACKed pkt

Unused pktACKed pkt

window

N unACKed pkts

window

N-1 unACKed pkts

ACK arrives

Send pkt

N unACKed pkts

window

N unACKed pkts

windowNo ACK arrives hellip timeout

0 unACKed pkts

window

GBN sender extended Activity Diagram

GBN Receiver Activity Diagram

GBN sender extended Activity DiagramWaiting for file

Set NSet NextPktToSend=0

Set LastACKed=-1

NextPktToSend ndash LastACKedltN

Send pkt[NextPktToSend] with SeqNum= NextPktToSend NextPktToSend++

Set Timer(NextPktToSend) = Now + TO

Wait

Clear Timers(LastACKed+1 to AN)LastACKed = AN

ACK arrived with ACKNum = AN

otherwise

Clear Timers(LastACKed+1 to NextPktToSend-1)NextPktToSend = LastACKed+1

Timer expires

GBN Receiver Activity Diagramstart

Set NextPktToRec = 0Clear ReceiverBufferClear ReceivedPktsReceiverBase = 0

ReceivedPkts[NextPktToRec] == 1

NextPktToRec++Give ReceiverBuffer[NextPktToRec] to app

Send ACK with ACKNum = NextPktToRec - 1

Place Pkt in ReceiverBuffer[SeqNum]ReceivedPkts[SeqNum]=1

wait

otherwise

GBN sender extended FSM

Wait start_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

GBN receiver extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

GBN in Action

sender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK=0Rec 1 give to app and Send ACK=1Rec 2 give to app and Send ACK=2

Rec 3 give to app and Send ACK=3

Rec 4 give to app and Send ACK=4

Rec 5 give to app and Send ACK=5

Rec 7 discard and Send ACK=5

Rec 8 discard and Send ACK=5

Rec 9 discard and Send ACK=5

Rec 10 discard and Send ACK=5

Rec 11 discard and Send ACK=5

Rec 12 discard and Send ACK=5

Rec 13 discard and Send ACK=5

Rec 6 give to app and Send ACK=6

Rec 7 give to app and Send ACK=7Rec 8 give to app and Send ACK=8

Rec 9 give to app and Send ACK=9

Optimal size of N in GBN

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order

delivery to upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart

timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed

pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]

send ACK(n) out-of-order buffer in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

Selective repeat in action

Summary of transport layer tools used so far

ACK and NACK Sequence numbers (and no NACK) Time out Sliding window

Optimal size = bandwidth delay product (if no other flows are using the network)

Cumulative ACK Buffer at the receiver is optional

Selective ACK Requires buffering at the receiver

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size

connection-oriented handshaking (exchange

of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver

reliable in-order byte steam

Pipelined and time-varying window size TCP congestion and

flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

It can be used as a pointer for placing the received data in the receiver buffer

ACKs seq of next byte

expected from other side

cumulative ACK

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

rdt20 has a fatal flaw

What happens if ACKNAK corrupted

sender doesnrsquot know what happened at receiver

canrsquot just retransmit possible duplicate

Handling duplicates sender retransmits current

pkt if ACKNAK garbled sender adds sequence

number to each pkt receiver discards (doesnrsquot

deliver up) duplicate pkt

Sender sends one packet then waits for receiver response

stop and wait

rdt21 sender handles garbled ACKNAKs

rdt21 receiver handles garbled ACKNAKs

rdt21 sender handles garbled ACKNAKs

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

Wait for ACK or NAK 0 udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

Wait for call 1 from

above

Wait for ACK or NAK 1

rdt21 receiver handles garbled ACKNAKs

Wait for 0 from below

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt21 discussion

Sender seq added to pkt two seq rsquos (01)

will suffice Why must check if

received ACKNAK corrupted

twice as many states state must

ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver must check if

received packet is duplicate state indicates

whether 0 or 1 is expected pkt seq

note receiver can not know if its last ACKNAK received OK at sender

rdt22 a NAK-free protocol

same functionality as rdt21 using ACKs only instead of NAK receiver sends ACK for last pkt

received OK receiver must explicitly include seq of pkt being

ACKed

duplicate ACK at sender results in same action as NAK retransmit current pkt

rdt22 sender receiver fragments

rdt22 sender receiver fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

receiver FSMfragment

What happens if a pkt is duplicated

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs) checksum seq

ACKs retransmissions will be of help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

retransmits if no ACK received in this time

if pkt (or ACK) just delayed (not lost) retransmission will be

duplicate but use of seq rsquos already handles this

receiver must specify seq of pkt being ACKed

requires countdown timer

rdt30 sender

rdt30 sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

rdt_rcv(rcvpkt)

resend pkt1

rdt30 in action

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1

rec ack0

rec pkt1

send ack1rec ack1

send pkt1

rec pkt1

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1rec ack0

send pkt2

rec pkt2

TO

rec pkt1send ack1

rec ack1

rdt30 in action

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1rec pkt1

TO

send ack1

send pkt2rec ack1

send ack1

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1

rec pkt1

TO

send ack1

send pkt

rec ack1

send ack1

send pkt2

rec pkt2rec ack1

send no pkt (dupACK) send ack2rec ack2

send pkt2

Performance of rdt30 rdt30 works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit packet and 100bit ACK

What is the total delaybull Data transmission delay

ndash 8000109 = 810-6

bull ACK Transmission delayndash 100109 = 10-7 sec

bull Total Delay ndash 215ms + 008 + 0001=300081ms

Utilization Time transmitting total time 008 300081 = 000027

This is one pkt every 30msec or 33 kBsec over a 1 Gbps link

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Pipelining Protocols

Go-back-N big picture Sender can have up

to N unacked packets in pipeline

Rcvr only sends cumulative acks Doesnrsquot ack packet if

therersquos a gap Sender has timer for

oldest unacked packet If timer expires

retransmit all unacked packets

Selective Repeat big pic

Sender can have up to N unacked packets in pipeline

Rcvr acks individual packets

Sender maintains timer for each unacked packet When timer expires

retransmit only unack packet

Selective repeat big picture

Sender can have up to N unacked packets in pipeline

Receiver acks individual packets Sender maintains timer for each

unacked packet When timer expires retransmit only unack

packet

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

Go-Back-NPkt that could be sent unACKed pkt

Unused pktACKed pkt

send pkt

window

1 unACKed pkts

startwindowN=12

0 unACKed pkts

pkts

send pkts

windowN unACKed pkts

Next pkt to be sent

ACK arrives

windowN=12

Send pkt

N unACKed pkts

windowN-1 unACKed pkts

Sliding window

State of pkts

Go-Back-N Pkt that could be sent unACKed pkt

Unused pktACKed pkt

window

N unACKed pkts

window

N-1 unACKed pkts

ACK arrives

Send pkt

N unACKed pkts

window

N unACKed pkts

windowNo ACK arrives hellip timeout

0 unACKed pkts

window

GBN sender extended Activity Diagram

GBN Receiver Activity Diagram

GBN sender extended Activity DiagramWaiting for file

Set NSet NextPktToSend=0

Set LastACKed=-1

NextPktToSend ndash LastACKedltN

Send pkt[NextPktToSend] with SeqNum= NextPktToSend NextPktToSend++

Set Timer(NextPktToSend) = Now + TO

Wait

Clear Timers(LastACKed+1 to AN)LastACKed = AN

ACK arrived with ACKNum = AN

otherwise

Clear Timers(LastACKed+1 to NextPktToSend-1)NextPktToSend = LastACKed+1

Timer expires

GBN Receiver Activity Diagramstart

Set NextPktToRec = 0Clear ReceiverBufferClear ReceivedPktsReceiverBase = 0

ReceivedPkts[NextPktToRec] == 1

NextPktToRec++Give ReceiverBuffer[NextPktToRec] to app

Send ACK with ACKNum = NextPktToRec - 1

Place Pkt in ReceiverBuffer[SeqNum]ReceivedPkts[SeqNum]=1

wait

otherwise

GBN sender extended FSM

Wait start_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

GBN receiver extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

GBN in Action

sender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK=0Rec 1 give to app and Send ACK=1Rec 2 give to app and Send ACK=2

Rec 3 give to app and Send ACK=3

Rec 4 give to app and Send ACK=4

Rec 5 give to app and Send ACK=5

Rec 7 discard and Send ACK=5

Rec 8 discard and Send ACK=5

Rec 9 discard and Send ACK=5

Rec 10 discard and Send ACK=5

Rec 11 discard and Send ACK=5

Rec 12 discard and Send ACK=5

Rec 13 discard and Send ACK=5

Rec 6 give to app and Send ACK=6

Rec 7 give to app and Send ACK=7Rec 8 give to app and Send ACK=8

Rec 9 give to app and Send ACK=9

Optimal size of N in GBN

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order

delivery to upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart

timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed

pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]

send ACK(n) out-of-order buffer in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

Selective repeat in action

Summary of transport layer tools used so far

ACK and NACK Sequence numbers (and no NACK) Time out Sliding window

Optimal size = bandwidth delay product (if no other flows are using the network)

Cumulative ACK Buffer at the receiver is optional

Selective ACK Requires buffering at the receiver

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size

connection-oriented handshaking (exchange

of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver

reliable in-order byte steam

Pipelined and time-varying window size TCP congestion and

flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

It can be used as a pointer for placing the received data in the receiver buffer

ACKs seq of next byte

expected from other side

cumulative ACK

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

rdt21 sender handles garbled ACKNAKs

rdt21 receiver handles garbled ACKNAKs

rdt21 sender handles garbled ACKNAKs

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

Wait for ACK or NAK 0 udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

Wait for call 1 from

above

Wait for ACK or NAK 1

rdt21 receiver handles garbled ACKNAKs

Wait for 0 from below

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt21 discussion

Sender seq added to pkt two seq rsquos (01)

will suffice Why must check if

received ACKNAK corrupted

twice as many states state must

ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver must check if

received packet is duplicate state indicates

whether 0 or 1 is expected pkt seq

note receiver can not know if its last ACKNAK received OK at sender

rdt22 a NAK-free protocol

same functionality as rdt21 using ACKs only instead of NAK receiver sends ACK for last pkt

received OK receiver must explicitly include seq of pkt being

ACKed

duplicate ACK at sender results in same action as NAK retransmit current pkt

rdt22 sender receiver fragments

rdt22 sender receiver fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

receiver FSMfragment

What happens if a pkt is duplicated

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs) checksum seq

ACKs retransmissions will be of help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

retransmits if no ACK received in this time

if pkt (or ACK) just delayed (not lost) retransmission will be

duplicate but use of seq rsquos already handles this

receiver must specify seq of pkt being ACKed

requires countdown timer

rdt30 sender

rdt30 sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

rdt_rcv(rcvpkt)

resend pkt1

rdt30 in action

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1

rec ack0

rec pkt1

send ack1rec ack1

send pkt1

rec pkt1

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1rec ack0

send pkt2

rec pkt2

TO

rec pkt1send ack1

rec ack1

rdt30 in action

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1rec pkt1

TO

send ack1

send pkt2rec ack1

send ack1

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1

rec pkt1

TO

send ack1

send pkt

rec ack1

send ack1

send pkt2

rec pkt2rec ack1

send no pkt (dupACK) send ack2rec ack2

send pkt2

Performance of rdt30 rdt30 works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit packet and 100bit ACK

What is the total delaybull Data transmission delay

ndash 8000109 = 810-6

bull ACK Transmission delayndash 100109 = 10-7 sec

bull Total Delay ndash 215ms + 008 + 0001=300081ms

Utilization Time transmitting total time 008 300081 = 000027

This is one pkt every 30msec or 33 kBsec over a 1 Gbps link

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Pipelining Protocols

Go-back-N big picture Sender can have up

to N unacked packets in pipeline

Rcvr only sends cumulative acks Doesnrsquot ack packet if

therersquos a gap Sender has timer for

oldest unacked packet If timer expires

retransmit all unacked packets

Selective Repeat big pic

Sender can have up to N unacked packets in pipeline

Rcvr acks individual packets

Sender maintains timer for each unacked packet When timer expires

retransmit only unack packet

Selective repeat big picture

Sender can have up to N unacked packets in pipeline

Receiver acks individual packets Sender maintains timer for each

unacked packet When timer expires retransmit only unack

packet

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

Go-Back-NPkt that could be sent unACKed pkt

Unused pktACKed pkt

send pkt

window

1 unACKed pkts

startwindowN=12

0 unACKed pkts

pkts

send pkts

windowN unACKed pkts

Next pkt to be sent

ACK arrives

windowN=12

Send pkt

N unACKed pkts

windowN-1 unACKed pkts

Sliding window

State of pkts

Go-Back-N Pkt that could be sent unACKed pkt

Unused pktACKed pkt

window

N unACKed pkts

window

N-1 unACKed pkts

ACK arrives

Send pkt

N unACKed pkts

window

N unACKed pkts

windowNo ACK arrives hellip timeout

0 unACKed pkts

window

GBN sender extended Activity Diagram

GBN Receiver Activity Diagram

GBN sender extended Activity DiagramWaiting for file

Set NSet NextPktToSend=0

Set LastACKed=-1

NextPktToSend ndash LastACKedltN

Send pkt[NextPktToSend] with SeqNum= NextPktToSend NextPktToSend++

Set Timer(NextPktToSend) = Now + TO

Wait

Clear Timers(LastACKed+1 to AN)LastACKed = AN

ACK arrived with ACKNum = AN

otherwise

Clear Timers(LastACKed+1 to NextPktToSend-1)NextPktToSend = LastACKed+1

Timer expires

GBN Receiver Activity Diagramstart

Set NextPktToRec = 0Clear ReceiverBufferClear ReceivedPktsReceiverBase = 0

ReceivedPkts[NextPktToRec] == 1

NextPktToRec++Give ReceiverBuffer[NextPktToRec] to app

Send ACK with ACKNum = NextPktToRec - 1

Place Pkt in ReceiverBuffer[SeqNum]ReceivedPkts[SeqNum]=1

wait

otherwise

GBN sender extended FSM

Wait start_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

GBN receiver extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

GBN in Action

sender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK=0Rec 1 give to app and Send ACK=1Rec 2 give to app and Send ACK=2

Rec 3 give to app and Send ACK=3

Rec 4 give to app and Send ACK=4

Rec 5 give to app and Send ACK=5

Rec 7 discard and Send ACK=5

Rec 8 discard and Send ACK=5

Rec 9 discard and Send ACK=5

Rec 10 discard and Send ACK=5

Rec 11 discard and Send ACK=5

Rec 12 discard and Send ACK=5

Rec 13 discard and Send ACK=5

Rec 6 give to app and Send ACK=6

Rec 7 give to app and Send ACK=7Rec 8 give to app and Send ACK=8

Rec 9 give to app and Send ACK=9

Optimal size of N in GBN

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order

delivery to upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart

timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed

pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]

send ACK(n) out-of-order buffer in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

Selective repeat in action

Summary of transport layer tools used so far

ACK and NACK Sequence numbers (and no NACK) Time out Sliding window

Optimal size = bandwidth delay product (if no other flows are using the network)

Cumulative ACK Buffer at the receiver is optional

Selective ACK Requires buffering at the receiver

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size

connection-oriented handshaking (exchange

of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver

reliable in-order byte steam

Pipelined and time-varying window size TCP congestion and

flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

It can be used as a pointer for placing the received data in the receiver buffer

ACKs seq of next byte

expected from other side

cumulative ACK

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

rdt21 receiver handles garbled ACKNAKs

rdt21 sender handles garbled ACKNAKs

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

Wait for ACK or NAK 0 udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

Wait for call 1 from

above

Wait for ACK or NAK 1

rdt21 receiver handles garbled ACKNAKs

Wait for 0 from below

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt21 discussion

Sender seq added to pkt two seq rsquos (01)

will suffice Why must check if

received ACKNAK corrupted

twice as many states state must

ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver must check if

received packet is duplicate state indicates

whether 0 or 1 is expected pkt seq

note receiver can not know if its last ACKNAK received OK at sender

rdt22 a NAK-free protocol

same functionality as rdt21 using ACKs only instead of NAK receiver sends ACK for last pkt

received OK receiver must explicitly include seq of pkt being

ACKed

duplicate ACK at sender results in same action as NAK retransmit current pkt

rdt22 sender receiver fragments

rdt22 sender receiver fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

receiver FSMfragment

What happens if a pkt is duplicated

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs) checksum seq

ACKs retransmissions will be of help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

retransmits if no ACK received in this time

if pkt (or ACK) just delayed (not lost) retransmission will be

duplicate but use of seq rsquos already handles this

receiver must specify seq of pkt being ACKed

requires countdown timer

rdt30 sender

rdt30 sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

rdt_rcv(rcvpkt)

resend pkt1

rdt30 in action

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1

rec ack0

rec pkt1

send ack1rec ack1

send pkt1

rec pkt1

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1rec ack0

send pkt2

rec pkt2

TO

rec pkt1send ack1

rec ack1

rdt30 in action

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1rec pkt1

TO

send ack1

send pkt2rec ack1

send ack1

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1

rec pkt1

TO

send ack1

send pkt

rec ack1

send ack1

send pkt2

rec pkt2rec ack1

send no pkt (dupACK) send ack2rec ack2

send pkt2

Performance of rdt30 rdt30 works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit packet and 100bit ACK

What is the total delaybull Data transmission delay

ndash 8000109 = 810-6

bull ACK Transmission delayndash 100109 = 10-7 sec

bull Total Delay ndash 215ms + 008 + 0001=300081ms

Utilization Time transmitting total time 008 300081 = 000027

This is one pkt every 30msec or 33 kBsec over a 1 Gbps link

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Pipelining Protocols

Go-back-N big picture Sender can have up

to N unacked packets in pipeline

Rcvr only sends cumulative acks Doesnrsquot ack packet if

therersquos a gap Sender has timer for

oldest unacked packet If timer expires

retransmit all unacked packets

Selective Repeat big pic

Sender can have up to N unacked packets in pipeline

Rcvr acks individual packets

Sender maintains timer for each unacked packet When timer expires

retransmit only unack packet

Selective repeat big picture

Sender can have up to N unacked packets in pipeline

Receiver acks individual packets Sender maintains timer for each

unacked packet When timer expires retransmit only unack

packet

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

Go-Back-NPkt that could be sent unACKed pkt

Unused pktACKed pkt

send pkt

window

1 unACKed pkts

startwindowN=12

0 unACKed pkts

pkts

send pkts

windowN unACKed pkts

Next pkt to be sent

ACK arrives

windowN=12

Send pkt

N unACKed pkts

windowN-1 unACKed pkts

Sliding window

State of pkts

Go-Back-N Pkt that could be sent unACKed pkt

Unused pktACKed pkt

window

N unACKed pkts

window

N-1 unACKed pkts

ACK arrives

Send pkt

N unACKed pkts

window

N unACKed pkts

windowNo ACK arrives hellip timeout

0 unACKed pkts

window

GBN sender extended Activity Diagram

GBN Receiver Activity Diagram

GBN sender extended Activity DiagramWaiting for file

Set NSet NextPktToSend=0

Set LastACKed=-1

NextPktToSend ndash LastACKedltN

Send pkt[NextPktToSend] with SeqNum= NextPktToSend NextPktToSend++

Set Timer(NextPktToSend) = Now + TO

Wait

Clear Timers(LastACKed+1 to AN)LastACKed = AN

ACK arrived with ACKNum = AN

otherwise

Clear Timers(LastACKed+1 to NextPktToSend-1)NextPktToSend = LastACKed+1

Timer expires

GBN Receiver Activity Diagramstart

Set NextPktToRec = 0Clear ReceiverBufferClear ReceivedPktsReceiverBase = 0

ReceivedPkts[NextPktToRec] == 1

NextPktToRec++Give ReceiverBuffer[NextPktToRec] to app

Send ACK with ACKNum = NextPktToRec - 1

Place Pkt in ReceiverBuffer[SeqNum]ReceivedPkts[SeqNum]=1

wait

otherwise

GBN sender extended FSM

Wait start_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

GBN receiver extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

GBN in Action

sender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK=0Rec 1 give to app and Send ACK=1Rec 2 give to app and Send ACK=2

Rec 3 give to app and Send ACK=3

Rec 4 give to app and Send ACK=4

Rec 5 give to app and Send ACK=5

Rec 7 discard and Send ACK=5

Rec 8 discard and Send ACK=5

Rec 9 discard and Send ACK=5

Rec 10 discard and Send ACK=5

Rec 11 discard and Send ACK=5

Rec 12 discard and Send ACK=5

Rec 13 discard and Send ACK=5

Rec 6 give to app and Send ACK=6

Rec 7 give to app and Send ACK=7Rec 8 give to app and Send ACK=8

Rec 9 give to app and Send ACK=9

Optimal size of N in GBN

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order

delivery to upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart

timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed

pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]

send ACK(n) out-of-order buffer in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

Selective repeat in action

Summary of transport layer tools used so far

ACK and NACK Sequence numbers (and no NACK) Time out Sliding window

Optimal size = bandwidth delay product (if no other flows are using the network)

Cumulative ACK Buffer at the receiver is optional

Selective ACK Requires buffering at the receiver

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size

connection-oriented handshaking (exchange

of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver

reliable in-order byte steam

Pipelined and time-varying window size TCP congestion and

flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

It can be used as a pointer for placing the received data in the receiver buffer

ACKs seq of next byte

expected from other side

cumulative ACK

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

rdt21 sender handles garbled ACKNAKs

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

Wait for ACK or NAK 0 udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

Wait for call 1 from

above

Wait for ACK or NAK 1

rdt21 receiver handles garbled ACKNAKs

Wait for 0 from below

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt21 discussion

Sender seq added to pkt two seq rsquos (01)

will suffice Why must check if

received ACKNAK corrupted

twice as many states state must

ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver must check if

received packet is duplicate state indicates

whether 0 or 1 is expected pkt seq

note receiver can not know if its last ACKNAK received OK at sender

rdt22 a NAK-free protocol

same functionality as rdt21 using ACKs only instead of NAK receiver sends ACK for last pkt

received OK receiver must explicitly include seq of pkt being

ACKed

duplicate ACK at sender results in same action as NAK retransmit current pkt

rdt22 sender receiver fragments

rdt22 sender receiver fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

receiver FSMfragment

What happens if a pkt is duplicated

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs) checksum seq

ACKs retransmissions will be of help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

retransmits if no ACK received in this time

if pkt (or ACK) just delayed (not lost) retransmission will be

duplicate but use of seq rsquos already handles this

receiver must specify seq of pkt being ACKed

requires countdown timer

rdt30 sender

rdt30 sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

rdt_rcv(rcvpkt)

resend pkt1

rdt30 in action

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1

rec ack0

rec pkt1

send ack1rec ack1

send pkt1

rec pkt1

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1rec ack0

send pkt2

rec pkt2

TO

rec pkt1send ack1

rec ack1

rdt30 in action

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1rec pkt1

TO

send ack1

send pkt2rec ack1

send ack1

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1

rec pkt1

TO

send ack1

send pkt

rec ack1

send ack1

send pkt2

rec pkt2rec ack1

send no pkt (dupACK) send ack2rec ack2

send pkt2

Performance of rdt30 rdt30 works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit packet and 100bit ACK

What is the total delaybull Data transmission delay

ndash 8000109 = 810-6

bull ACK Transmission delayndash 100109 = 10-7 sec

bull Total Delay ndash 215ms + 008 + 0001=300081ms

Utilization Time transmitting total time 008 300081 = 000027

This is one pkt every 30msec or 33 kBsec over a 1 Gbps link

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Pipelining Protocols

Go-back-N big picture Sender can have up

to N unacked packets in pipeline

Rcvr only sends cumulative acks Doesnrsquot ack packet if

therersquos a gap Sender has timer for

oldest unacked packet If timer expires

retransmit all unacked packets

Selective Repeat big pic

Sender can have up to N unacked packets in pipeline

Rcvr acks individual packets

Sender maintains timer for each unacked packet When timer expires

retransmit only unack packet

Selective repeat big picture

Sender can have up to N unacked packets in pipeline

Receiver acks individual packets Sender maintains timer for each

unacked packet When timer expires retransmit only unack

packet

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

Go-Back-NPkt that could be sent unACKed pkt

Unused pktACKed pkt

send pkt

window

1 unACKed pkts

startwindowN=12

0 unACKed pkts

pkts

send pkts

windowN unACKed pkts

Next pkt to be sent

ACK arrives

windowN=12

Send pkt

N unACKed pkts

windowN-1 unACKed pkts

Sliding window

State of pkts

Go-Back-N Pkt that could be sent unACKed pkt

Unused pktACKed pkt

window

N unACKed pkts

window

N-1 unACKed pkts

ACK arrives

Send pkt

N unACKed pkts

window

N unACKed pkts

windowNo ACK arrives hellip timeout

0 unACKed pkts

window

GBN sender extended Activity Diagram

GBN Receiver Activity Diagram

GBN sender extended Activity DiagramWaiting for file

Set NSet NextPktToSend=0

Set LastACKed=-1

NextPktToSend ndash LastACKedltN

Send pkt[NextPktToSend] with SeqNum= NextPktToSend NextPktToSend++

Set Timer(NextPktToSend) = Now + TO

Wait

Clear Timers(LastACKed+1 to AN)LastACKed = AN

ACK arrived with ACKNum = AN

otherwise

Clear Timers(LastACKed+1 to NextPktToSend-1)NextPktToSend = LastACKed+1

Timer expires

GBN Receiver Activity Diagramstart

Set NextPktToRec = 0Clear ReceiverBufferClear ReceivedPktsReceiverBase = 0

ReceivedPkts[NextPktToRec] == 1

NextPktToRec++Give ReceiverBuffer[NextPktToRec] to app

Send ACK with ACKNum = NextPktToRec - 1

Place Pkt in ReceiverBuffer[SeqNum]ReceivedPkts[SeqNum]=1

wait

otherwise

GBN sender extended FSM

Wait start_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

GBN receiver extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

GBN in Action

sender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK=0Rec 1 give to app and Send ACK=1Rec 2 give to app and Send ACK=2

Rec 3 give to app and Send ACK=3

Rec 4 give to app and Send ACK=4

Rec 5 give to app and Send ACK=5

Rec 7 discard and Send ACK=5

Rec 8 discard and Send ACK=5

Rec 9 discard and Send ACK=5

Rec 10 discard and Send ACK=5

Rec 11 discard and Send ACK=5

Rec 12 discard and Send ACK=5

Rec 13 discard and Send ACK=5

Rec 6 give to app and Send ACK=6

Rec 7 give to app and Send ACK=7Rec 8 give to app and Send ACK=8

Rec 9 give to app and Send ACK=9

Optimal size of N in GBN

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order

delivery to upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart

timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed

pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]

send ACK(n) out-of-order buffer in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

Selective repeat in action

Summary of transport layer tools used so far

ACK and NACK Sequence numbers (and no NACK) Time out Sliding window

Optimal size = bandwidth delay product (if no other flows are using the network)

Cumulative ACK Buffer at the receiver is optional

Selective ACK Requires buffering at the receiver

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size

connection-oriented handshaking (exchange

of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver

reliable in-order byte steam

Pipelined and time-varying window size TCP congestion and

flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

It can be used as a pointer for placing the received data in the receiver buffer

ACKs seq of next byte

expected from other side

cumulative ACK

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

rdt21 receiver handles garbled ACKNAKs

Wait for 0 from below

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt21 discussion

Sender seq added to pkt two seq rsquos (01)

will suffice Why must check if

received ACKNAK corrupted

twice as many states state must

ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver must check if

received packet is duplicate state indicates

whether 0 or 1 is expected pkt seq

note receiver can not know if its last ACKNAK received OK at sender

rdt22 a NAK-free protocol

same functionality as rdt21 using ACKs only instead of NAK receiver sends ACK for last pkt

received OK receiver must explicitly include seq of pkt being

ACKed

duplicate ACK at sender results in same action as NAK retransmit current pkt

rdt22 sender receiver fragments

rdt22 sender receiver fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

receiver FSMfragment

What happens if a pkt is duplicated

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs) checksum seq

ACKs retransmissions will be of help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

retransmits if no ACK received in this time

if pkt (or ACK) just delayed (not lost) retransmission will be

duplicate but use of seq rsquos already handles this

receiver must specify seq of pkt being ACKed

requires countdown timer

rdt30 sender

rdt30 sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

rdt_rcv(rcvpkt)

resend pkt1

rdt30 in action

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1

rec ack0

rec pkt1

send ack1rec ack1

send pkt1

rec pkt1

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1rec ack0

send pkt2

rec pkt2

TO

rec pkt1send ack1

rec ack1

rdt30 in action

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1rec pkt1

TO

send ack1

send pkt2rec ack1

send ack1

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1

rec pkt1

TO

send ack1

send pkt

rec ack1

send ack1

send pkt2

rec pkt2rec ack1

send no pkt (dupACK) send ack2rec ack2

send pkt2

Performance of rdt30 rdt30 works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit packet and 100bit ACK

What is the total delaybull Data transmission delay

ndash 8000109 = 810-6

bull ACK Transmission delayndash 100109 = 10-7 sec

bull Total Delay ndash 215ms + 008 + 0001=300081ms

Utilization Time transmitting total time 008 300081 = 000027

This is one pkt every 30msec or 33 kBsec over a 1 Gbps link

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Pipelining Protocols

Go-back-N big picture Sender can have up

to N unacked packets in pipeline

Rcvr only sends cumulative acks Doesnrsquot ack packet if

therersquos a gap Sender has timer for

oldest unacked packet If timer expires

retransmit all unacked packets

Selective Repeat big pic

Sender can have up to N unacked packets in pipeline

Rcvr acks individual packets

Sender maintains timer for each unacked packet When timer expires

retransmit only unack packet

Selective repeat big picture

Sender can have up to N unacked packets in pipeline

Receiver acks individual packets Sender maintains timer for each

unacked packet When timer expires retransmit only unack

packet

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

Go-Back-NPkt that could be sent unACKed pkt

Unused pktACKed pkt

send pkt

window

1 unACKed pkts

startwindowN=12

0 unACKed pkts

pkts

send pkts

windowN unACKed pkts

Next pkt to be sent

ACK arrives

windowN=12

Send pkt

N unACKed pkts

windowN-1 unACKed pkts

Sliding window

State of pkts

Go-Back-N Pkt that could be sent unACKed pkt

Unused pktACKed pkt

window

N unACKed pkts

window

N-1 unACKed pkts

ACK arrives

Send pkt

N unACKed pkts

window

N unACKed pkts

windowNo ACK arrives hellip timeout

0 unACKed pkts

window

GBN sender extended Activity Diagram

GBN Receiver Activity Diagram

GBN sender extended Activity DiagramWaiting for file

Set NSet NextPktToSend=0

Set LastACKed=-1

NextPktToSend ndash LastACKedltN

Send pkt[NextPktToSend] with SeqNum= NextPktToSend NextPktToSend++

Set Timer(NextPktToSend) = Now + TO

Wait

Clear Timers(LastACKed+1 to AN)LastACKed = AN

ACK arrived with ACKNum = AN

otherwise

Clear Timers(LastACKed+1 to NextPktToSend-1)NextPktToSend = LastACKed+1

Timer expires

GBN Receiver Activity Diagramstart

Set NextPktToRec = 0Clear ReceiverBufferClear ReceivedPktsReceiverBase = 0

ReceivedPkts[NextPktToRec] == 1

NextPktToRec++Give ReceiverBuffer[NextPktToRec] to app

Send ACK with ACKNum = NextPktToRec - 1

Place Pkt in ReceiverBuffer[SeqNum]ReceivedPkts[SeqNum]=1

wait

otherwise

GBN sender extended FSM

Wait start_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

GBN receiver extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

GBN in Action

sender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK=0Rec 1 give to app and Send ACK=1Rec 2 give to app and Send ACK=2

Rec 3 give to app and Send ACK=3

Rec 4 give to app and Send ACK=4

Rec 5 give to app and Send ACK=5

Rec 7 discard and Send ACK=5

Rec 8 discard and Send ACK=5

Rec 9 discard and Send ACK=5

Rec 10 discard and Send ACK=5

Rec 11 discard and Send ACK=5

Rec 12 discard and Send ACK=5

Rec 13 discard and Send ACK=5

Rec 6 give to app and Send ACK=6

Rec 7 give to app and Send ACK=7Rec 8 give to app and Send ACK=8

Rec 9 give to app and Send ACK=9

Optimal size of N in GBN

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order

delivery to upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart

timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed

pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]

send ACK(n) out-of-order buffer in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

Selective repeat in action

Summary of transport layer tools used so far

ACK and NACK Sequence numbers (and no NACK) Time out Sliding window

Optimal size = bandwidth delay product (if no other flows are using the network)

Cumulative ACK Buffer at the receiver is optional

Selective ACK Requires buffering at the receiver

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size

connection-oriented handshaking (exchange

of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver

reliable in-order byte steam

Pipelined and time-varying window size TCP congestion and

flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

It can be used as a pointer for placing the received data in the receiver buffer

ACKs seq of next byte

expected from other side

cumulative ACK

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

rdt21 discussion

Sender seq added to pkt two seq rsquos (01)

will suffice Why must check if

received ACKNAK corrupted

twice as many states state must

ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver must check if

received packet is duplicate state indicates

whether 0 or 1 is expected pkt seq

note receiver can not know if its last ACKNAK received OK at sender

rdt22 a NAK-free protocol

same functionality as rdt21 using ACKs only instead of NAK receiver sends ACK for last pkt

received OK receiver must explicitly include seq of pkt being

ACKed

duplicate ACK at sender results in same action as NAK retransmit current pkt

rdt22 sender receiver fragments

rdt22 sender receiver fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

receiver FSMfragment

What happens if a pkt is duplicated

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs) checksum seq

ACKs retransmissions will be of help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

retransmits if no ACK received in this time

if pkt (or ACK) just delayed (not lost) retransmission will be

duplicate but use of seq rsquos already handles this

receiver must specify seq of pkt being ACKed

requires countdown timer

rdt30 sender

rdt30 sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

rdt_rcv(rcvpkt)

resend pkt1

rdt30 in action

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1

rec ack0

rec pkt1

send ack1rec ack1

send pkt1

rec pkt1

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1rec ack0

send pkt2

rec pkt2

TO

rec pkt1send ack1

rec ack1

rdt30 in action

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1rec pkt1

TO

send ack1

send pkt2rec ack1

send ack1

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1

rec pkt1

TO

send ack1

send pkt

rec ack1

send ack1

send pkt2

rec pkt2rec ack1

send no pkt (dupACK) send ack2rec ack2

send pkt2

Performance of rdt30 rdt30 works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit packet and 100bit ACK

What is the total delaybull Data transmission delay

ndash 8000109 = 810-6

bull ACK Transmission delayndash 100109 = 10-7 sec

bull Total Delay ndash 215ms + 008 + 0001=300081ms

Utilization Time transmitting total time 008 300081 = 000027

This is one pkt every 30msec or 33 kBsec over a 1 Gbps link

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Pipelining Protocols

Go-back-N big picture Sender can have up

to N unacked packets in pipeline

Rcvr only sends cumulative acks Doesnrsquot ack packet if

therersquos a gap Sender has timer for

oldest unacked packet If timer expires

retransmit all unacked packets

Selective Repeat big pic

Sender can have up to N unacked packets in pipeline

Rcvr acks individual packets

Sender maintains timer for each unacked packet When timer expires

retransmit only unack packet

Selective repeat big picture

Sender can have up to N unacked packets in pipeline

Receiver acks individual packets Sender maintains timer for each

unacked packet When timer expires retransmit only unack

packet

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

Go-Back-NPkt that could be sent unACKed pkt

Unused pktACKed pkt

send pkt

window

1 unACKed pkts

startwindowN=12

0 unACKed pkts

pkts

send pkts

windowN unACKed pkts

Next pkt to be sent

ACK arrives

windowN=12

Send pkt

N unACKed pkts

windowN-1 unACKed pkts

Sliding window

State of pkts

Go-Back-N Pkt that could be sent unACKed pkt

Unused pktACKed pkt

window

N unACKed pkts

window

N-1 unACKed pkts

ACK arrives

Send pkt

N unACKed pkts

window

N unACKed pkts

windowNo ACK arrives hellip timeout

0 unACKed pkts

window

GBN sender extended Activity Diagram

GBN Receiver Activity Diagram

GBN sender extended Activity DiagramWaiting for file

Set NSet NextPktToSend=0

Set LastACKed=-1

NextPktToSend ndash LastACKedltN

Send pkt[NextPktToSend] with SeqNum= NextPktToSend NextPktToSend++

Set Timer(NextPktToSend) = Now + TO

Wait

Clear Timers(LastACKed+1 to AN)LastACKed = AN

ACK arrived with ACKNum = AN

otherwise

Clear Timers(LastACKed+1 to NextPktToSend-1)NextPktToSend = LastACKed+1

Timer expires

GBN Receiver Activity Diagramstart

Set NextPktToRec = 0Clear ReceiverBufferClear ReceivedPktsReceiverBase = 0

ReceivedPkts[NextPktToRec] == 1

NextPktToRec++Give ReceiverBuffer[NextPktToRec] to app

Send ACK with ACKNum = NextPktToRec - 1

Place Pkt in ReceiverBuffer[SeqNum]ReceivedPkts[SeqNum]=1

wait

otherwise

GBN sender extended FSM

Wait start_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

GBN receiver extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

GBN in Action

sender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK=0Rec 1 give to app and Send ACK=1Rec 2 give to app and Send ACK=2

Rec 3 give to app and Send ACK=3

Rec 4 give to app and Send ACK=4

Rec 5 give to app and Send ACK=5

Rec 7 discard and Send ACK=5

Rec 8 discard and Send ACK=5

Rec 9 discard and Send ACK=5

Rec 10 discard and Send ACK=5

Rec 11 discard and Send ACK=5

Rec 12 discard and Send ACK=5

Rec 13 discard and Send ACK=5

Rec 6 give to app and Send ACK=6

Rec 7 give to app and Send ACK=7Rec 8 give to app and Send ACK=8

Rec 9 give to app and Send ACK=9

Optimal size of N in GBN

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order

delivery to upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart

timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed

pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]

send ACK(n) out-of-order buffer in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

Selective repeat in action

Summary of transport layer tools used so far

ACK and NACK Sequence numbers (and no NACK) Time out Sliding window

Optimal size = bandwidth delay product (if no other flows are using the network)

Cumulative ACK Buffer at the receiver is optional

Selective ACK Requires buffering at the receiver

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size

connection-oriented handshaking (exchange

of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver

reliable in-order byte steam

Pipelined and time-varying window size TCP congestion and

flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

It can be used as a pointer for placing the received data in the receiver buffer

ACKs seq of next byte

expected from other side

cumulative ACK

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

rdt22 a NAK-free protocol

same functionality as rdt21 using ACKs only instead of NAK receiver sends ACK for last pkt

received OK receiver must explicitly include seq of pkt being

ACKed

duplicate ACK at sender results in same action as NAK retransmit current pkt

rdt22 sender receiver fragments

rdt22 sender receiver fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

receiver FSMfragment

What happens if a pkt is duplicated

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs) checksum seq

ACKs retransmissions will be of help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

retransmits if no ACK received in this time

if pkt (or ACK) just delayed (not lost) retransmission will be

duplicate but use of seq rsquos already handles this

receiver must specify seq of pkt being ACKed

requires countdown timer

rdt30 sender

rdt30 sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

rdt_rcv(rcvpkt)

resend pkt1

rdt30 in action

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1

rec ack0

rec pkt1

send ack1rec ack1

send pkt1

rec pkt1

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1rec ack0

send pkt2

rec pkt2

TO

rec pkt1send ack1

rec ack1

rdt30 in action

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1rec pkt1

TO

send ack1

send pkt2rec ack1

send ack1

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1

rec pkt1

TO

send ack1

send pkt

rec ack1

send ack1

send pkt2

rec pkt2rec ack1

send no pkt (dupACK) send ack2rec ack2

send pkt2

Performance of rdt30 rdt30 works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit packet and 100bit ACK

What is the total delaybull Data transmission delay

ndash 8000109 = 810-6

bull ACK Transmission delayndash 100109 = 10-7 sec

bull Total Delay ndash 215ms + 008 + 0001=300081ms

Utilization Time transmitting total time 008 300081 = 000027

This is one pkt every 30msec or 33 kBsec over a 1 Gbps link

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Pipelining Protocols

Go-back-N big picture Sender can have up

to N unacked packets in pipeline

Rcvr only sends cumulative acks Doesnrsquot ack packet if

therersquos a gap Sender has timer for

oldest unacked packet If timer expires

retransmit all unacked packets

Selective Repeat big pic

Sender can have up to N unacked packets in pipeline

Rcvr acks individual packets

Sender maintains timer for each unacked packet When timer expires

retransmit only unack packet

Selective repeat big picture

Sender can have up to N unacked packets in pipeline

Receiver acks individual packets Sender maintains timer for each

unacked packet When timer expires retransmit only unack

packet

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

Go-Back-NPkt that could be sent unACKed pkt

Unused pktACKed pkt

send pkt

window

1 unACKed pkts

startwindowN=12

0 unACKed pkts

pkts

send pkts

windowN unACKed pkts

Next pkt to be sent

ACK arrives

windowN=12

Send pkt

N unACKed pkts

windowN-1 unACKed pkts

Sliding window

State of pkts

Go-Back-N Pkt that could be sent unACKed pkt

Unused pktACKed pkt

window

N unACKed pkts

window

N-1 unACKed pkts

ACK arrives

Send pkt

N unACKed pkts

window

N unACKed pkts

windowNo ACK arrives hellip timeout

0 unACKed pkts

window

GBN sender extended Activity Diagram

GBN Receiver Activity Diagram

GBN sender extended Activity DiagramWaiting for file

Set NSet NextPktToSend=0

Set LastACKed=-1

NextPktToSend ndash LastACKedltN

Send pkt[NextPktToSend] with SeqNum= NextPktToSend NextPktToSend++

Set Timer(NextPktToSend) = Now + TO

Wait

Clear Timers(LastACKed+1 to AN)LastACKed = AN

ACK arrived with ACKNum = AN

otherwise

Clear Timers(LastACKed+1 to NextPktToSend-1)NextPktToSend = LastACKed+1

Timer expires

GBN Receiver Activity Diagramstart

Set NextPktToRec = 0Clear ReceiverBufferClear ReceivedPktsReceiverBase = 0

ReceivedPkts[NextPktToRec] == 1

NextPktToRec++Give ReceiverBuffer[NextPktToRec] to app

Send ACK with ACKNum = NextPktToRec - 1

Place Pkt in ReceiverBuffer[SeqNum]ReceivedPkts[SeqNum]=1

wait

otherwise

GBN sender extended FSM

Wait start_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

GBN receiver extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

GBN in Action

sender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK=0Rec 1 give to app and Send ACK=1Rec 2 give to app and Send ACK=2

Rec 3 give to app and Send ACK=3

Rec 4 give to app and Send ACK=4

Rec 5 give to app and Send ACK=5

Rec 7 discard and Send ACK=5

Rec 8 discard and Send ACK=5

Rec 9 discard and Send ACK=5

Rec 10 discard and Send ACK=5

Rec 11 discard and Send ACK=5

Rec 12 discard and Send ACK=5

Rec 13 discard and Send ACK=5

Rec 6 give to app and Send ACK=6

Rec 7 give to app and Send ACK=7Rec 8 give to app and Send ACK=8

Rec 9 give to app and Send ACK=9

Optimal size of N in GBN

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order

delivery to upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart

timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed

pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]

send ACK(n) out-of-order buffer in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

Selective repeat in action

Summary of transport layer tools used so far

ACK and NACK Sequence numbers (and no NACK) Time out Sliding window

Optimal size = bandwidth delay product (if no other flows are using the network)

Cumulative ACK Buffer at the receiver is optional

Selective ACK Requires buffering at the receiver

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size

connection-oriented handshaking (exchange

of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver

reliable in-order byte steam

Pipelined and time-varying window size TCP congestion and

flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

It can be used as a pointer for placing the received data in the receiver buffer

ACKs seq of next byte

expected from other side

cumulative ACK

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

rdt22 sender receiver fragments

rdt22 sender receiver fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

receiver FSMfragment

What happens if a pkt is duplicated

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs) checksum seq

ACKs retransmissions will be of help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

retransmits if no ACK received in this time

if pkt (or ACK) just delayed (not lost) retransmission will be

duplicate but use of seq rsquos already handles this

receiver must specify seq of pkt being ACKed

requires countdown timer

rdt30 sender

rdt30 sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

rdt_rcv(rcvpkt)

resend pkt1

rdt30 in action

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1

rec ack0

rec pkt1

send ack1rec ack1

send pkt1

rec pkt1

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1rec ack0

send pkt2

rec pkt2

TO

rec pkt1send ack1

rec ack1

rdt30 in action

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1rec pkt1

TO

send ack1

send pkt2rec ack1

send ack1

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1

rec pkt1

TO

send ack1

send pkt

rec ack1

send ack1

send pkt2

rec pkt2rec ack1

send no pkt (dupACK) send ack2rec ack2

send pkt2

Performance of rdt30 rdt30 works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit packet and 100bit ACK

What is the total delaybull Data transmission delay

ndash 8000109 = 810-6

bull ACK Transmission delayndash 100109 = 10-7 sec

bull Total Delay ndash 215ms + 008 + 0001=300081ms

Utilization Time transmitting total time 008 300081 = 000027

This is one pkt every 30msec or 33 kBsec over a 1 Gbps link

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Pipelining Protocols

Go-back-N big picture Sender can have up

to N unacked packets in pipeline

Rcvr only sends cumulative acks Doesnrsquot ack packet if

therersquos a gap Sender has timer for

oldest unacked packet If timer expires

retransmit all unacked packets

Selective Repeat big pic

Sender can have up to N unacked packets in pipeline

Rcvr acks individual packets

Sender maintains timer for each unacked packet When timer expires

retransmit only unack packet

Selective repeat big picture

Sender can have up to N unacked packets in pipeline

Receiver acks individual packets Sender maintains timer for each

unacked packet When timer expires retransmit only unack

packet

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

Go-Back-NPkt that could be sent unACKed pkt

Unused pktACKed pkt

send pkt

window

1 unACKed pkts

startwindowN=12

0 unACKed pkts

pkts

send pkts

windowN unACKed pkts

Next pkt to be sent

ACK arrives

windowN=12

Send pkt

N unACKed pkts

windowN-1 unACKed pkts

Sliding window

State of pkts

Go-Back-N Pkt that could be sent unACKed pkt

Unused pktACKed pkt

window

N unACKed pkts

window

N-1 unACKed pkts

ACK arrives

Send pkt

N unACKed pkts

window

N unACKed pkts

windowNo ACK arrives hellip timeout

0 unACKed pkts

window

GBN sender extended Activity Diagram

GBN Receiver Activity Diagram

GBN sender extended Activity DiagramWaiting for file

Set NSet NextPktToSend=0

Set LastACKed=-1

NextPktToSend ndash LastACKedltN

Send pkt[NextPktToSend] with SeqNum= NextPktToSend NextPktToSend++

Set Timer(NextPktToSend) = Now + TO

Wait

Clear Timers(LastACKed+1 to AN)LastACKed = AN

ACK arrived with ACKNum = AN

otherwise

Clear Timers(LastACKed+1 to NextPktToSend-1)NextPktToSend = LastACKed+1

Timer expires

GBN Receiver Activity Diagramstart

Set NextPktToRec = 0Clear ReceiverBufferClear ReceivedPktsReceiverBase = 0

ReceivedPkts[NextPktToRec] == 1

NextPktToRec++Give ReceiverBuffer[NextPktToRec] to app

Send ACK with ACKNum = NextPktToRec - 1

Place Pkt in ReceiverBuffer[SeqNum]ReceivedPkts[SeqNum]=1

wait

otherwise

GBN sender extended FSM

Wait start_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

GBN receiver extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

GBN in Action

sender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK=0Rec 1 give to app and Send ACK=1Rec 2 give to app and Send ACK=2

Rec 3 give to app and Send ACK=3

Rec 4 give to app and Send ACK=4

Rec 5 give to app and Send ACK=5

Rec 7 discard and Send ACK=5

Rec 8 discard and Send ACK=5

Rec 9 discard and Send ACK=5

Rec 10 discard and Send ACK=5

Rec 11 discard and Send ACK=5

Rec 12 discard and Send ACK=5

Rec 13 discard and Send ACK=5

Rec 6 give to app and Send ACK=6

Rec 7 give to app and Send ACK=7Rec 8 give to app and Send ACK=8

Rec 9 give to app and Send ACK=9

Optimal size of N in GBN

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order

delivery to upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart

timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed

pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]

send ACK(n) out-of-order buffer in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

Selective repeat in action

Summary of transport layer tools used so far

ACK and NACK Sequence numbers (and no NACK) Time out Sliding window

Optimal size = bandwidth delay product (if no other flows are using the network)

Cumulative ACK Buffer at the receiver is optional

Selective ACK Requires buffering at the receiver

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size

connection-oriented handshaking (exchange

of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver

reliable in-order byte steam

Pipelined and time-varying window size TCP congestion and

flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

It can be used as a pointer for placing the received data in the receiver buffer

ACKs seq of next byte

expected from other side

cumulative ACK

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

rdt22 sender receiver fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK

0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

receiver FSMfragment

What happens if a pkt is duplicated

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs) checksum seq

ACKs retransmissions will be of help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

retransmits if no ACK received in this time

if pkt (or ACK) just delayed (not lost) retransmission will be

duplicate but use of seq rsquos already handles this

receiver must specify seq of pkt being ACKed

requires countdown timer

rdt30 sender

rdt30 sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

rdt_rcv(rcvpkt)

resend pkt1

rdt30 in action

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1

rec ack0

rec pkt1

send ack1rec ack1

send pkt1

rec pkt1

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1rec ack0

send pkt2

rec pkt2

TO

rec pkt1send ack1

rec ack1

rdt30 in action

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1rec pkt1

TO

send ack1

send pkt2rec ack1

send ack1

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1

rec pkt1

TO

send ack1

send pkt

rec ack1

send ack1

send pkt2

rec pkt2rec ack1

send no pkt (dupACK) send ack2rec ack2

send pkt2

Performance of rdt30 rdt30 works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit packet and 100bit ACK

What is the total delaybull Data transmission delay

ndash 8000109 = 810-6

bull ACK Transmission delayndash 100109 = 10-7 sec

bull Total Delay ndash 215ms + 008 + 0001=300081ms

Utilization Time transmitting total time 008 300081 = 000027

This is one pkt every 30msec or 33 kBsec over a 1 Gbps link

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Pipelining Protocols

Go-back-N big picture Sender can have up

to N unacked packets in pipeline

Rcvr only sends cumulative acks Doesnrsquot ack packet if

therersquos a gap Sender has timer for

oldest unacked packet If timer expires

retransmit all unacked packets

Selective Repeat big pic

Sender can have up to N unacked packets in pipeline

Rcvr acks individual packets

Sender maintains timer for each unacked packet When timer expires

retransmit only unack packet

Selective repeat big picture

Sender can have up to N unacked packets in pipeline

Receiver acks individual packets Sender maintains timer for each

unacked packet When timer expires retransmit only unack

packet

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

Go-Back-NPkt that could be sent unACKed pkt

Unused pktACKed pkt

send pkt

window

1 unACKed pkts

startwindowN=12

0 unACKed pkts

pkts

send pkts

windowN unACKed pkts

Next pkt to be sent

ACK arrives

windowN=12

Send pkt

N unACKed pkts

windowN-1 unACKed pkts

Sliding window

State of pkts

Go-Back-N Pkt that could be sent unACKed pkt

Unused pktACKed pkt

window

N unACKed pkts

window

N-1 unACKed pkts

ACK arrives

Send pkt

N unACKed pkts

window

N unACKed pkts

windowNo ACK arrives hellip timeout

0 unACKed pkts

window

GBN sender extended Activity Diagram

GBN Receiver Activity Diagram

GBN sender extended Activity DiagramWaiting for file

Set NSet NextPktToSend=0

Set LastACKed=-1

NextPktToSend ndash LastACKedltN

Send pkt[NextPktToSend] with SeqNum= NextPktToSend NextPktToSend++

Set Timer(NextPktToSend) = Now + TO

Wait

Clear Timers(LastACKed+1 to AN)LastACKed = AN

ACK arrived with ACKNum = AN

otherwise

Clear Timers(LastACKed+1 to NextPktToSend-1)NextPktToSend = LastACKed+1

Timer expires

GBN Receiver Activity Diagramstart

Set NextPktToRec = 0Clear ReceiverBufferClear ReceivedPktsReceiverBase = 0

ReceivedPkts[NextPktToRec] == 1

NextPktToRec++Give ReceiverBuffer[NextPktToRec] to app

Send ACK with ACKNum = NextPktToRec - 1

Place Pkt in ReceiverBuffer[SeqNum]ReceivedPkts[SeqNum]=1

wait

otherwise

GBN sender extended FSM

Wait start_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

GBN receiver extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

GBN in Action

sender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK=0Rec 1 give to app and Send ACK=1Rec 2 give to app and Send ACK=2

Rec 3 give to app and Send ACK=3

Rec 4 give to app and Send ACK=4

Rec 5 give to app and Send ACK=5

Rec 7 discard and Send ACK=5

Rec 8 discard and Send ACK=5

Rec 9 discard and Send ACK=5

Rec 10 discard and Send ACK=5

Rec 11 discard and Send ACK=5

Rec 12 discard and Send ACK=5

Rec 13 discard and Send ACK=5

Rec 6 give to app and Send ACK=6

Rec 7 give to app and Send ACK=7Rec 8 give to app and Send ACK=8

Rec 9 give to app and Send ACK=9

Optimal size of N in GBN

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order

delivery to upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart

timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed

pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]

send ACK(n) out-of-order buffer in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

Selective repeat in action

Summary of transport layer tools used so far

ACK and NACK Sequence numbers (and no NACK) Time out Sliding window

Optimal size = bandwidth delay product (if no other flows are using the network)

Cumulative ACK Buffer at the receiver is optional

Selective ACK Requires buffering at the receiver

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size

connection-oriented handshaking (exchange

of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver

reliable in-order byte steam

Pipelined and time-varying window size TCP congestion and

flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

It can be used as a pointer for placing the received data in the receiver buffer

ACKs seq of next byte

expected from other side

cumulative ACK

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

rdt30 channels with errors and loss

New assumption underlying channel can also lose packets (data or ACKs) checksum seq

ACKs retransmissions will be of help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

retransmits if no ACK received in this time

if pkt (or ACK) just delayed (not lost) retransmission will be

duplicate but use of seq rsquos already handles this

receiver must specify seq of pkt being ACKed

requires countdown timer

rdt30 sender

rdt30 sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

rdt_rcv(rcvpkt)

resend pkt1

rdt30 in action

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1

rec ack0

rec pkt1

send ack1rec ack1

send pkt1

rec pkt1

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1rec ack0

send pkt2

rec pkt2

TO

rec pkt1send ack1

rec ack1

rdt30 in action

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1rec pkt1

TO

send ack1

send pkt2rec ack1

send ack1

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1

rec pkt1

TO

send ack1

send pkt

rec ack1

send ack1

send pkt2

rec pkt2rec ack1

send no pkt (dupACK) send ack2rec ack2

send pkt2

Performance of rdt30 rdt30 works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit packet and 100bit ACK

What is the total delaybull Data transmission delay

ndash 8000109 = 810-6

bull ACK Transmission delayndash 100109 = 10-7 sec

bull Total Delay ndash 215ms + 008 + 0001=300081ms

Utilization Time transmitting total time 008 300081 = 000027

This is one pkt every 30msec or 33 kBsec over a 1 Gbps link

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Pipelining Protocols

Go-back-N big picture Sender can have up

to N unacked packets in pipeline

Rcvr only sends cumulative acks Doesnrsquot ack packet if

therersquos a gap Sender has timer for

oldest unacked packet If timer expires

retransmit all unacked packets

Selective Repeat big pic

Sender can have up to N unacked packets in pipeline

Rcvr acks individual packets

Sender maintains timer for each unacked packet When timer expires

retransmit only unack packet

Selective repeat big picture

Sender can have up to N unacked packets in pipeline

Receiver acks individual packets Sender maintains timer for each

unacked packet When timer expires retransmit only unack

packet

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

Go-Back-NPkt that could be sent unACKed pkt

Unused pktACKed pkt

send pkt

window

1 unACKed pkts

startwindowN=12

0 unACKed pkts

pkts

send pkts

windowN unACKed pkts

Next pkt to be sent

ACK arrives

windowN=12

Send pkt

N unACKed pkts

windowN-1 unACKed pkts

Sliding window

State of pkts

Go-Back-N Pkt that could be sent unACKed pkt

Unused pktACKed pkt

window

N unACKed pkts

window

N-1 unACKed pkts

ACK arrives

Send pkt

N unACKed pkts

window

N unACKed pkts

windowNo ACK arrives hellip timeout

0 unACKed pkts

window

GBN sender extended Activity Diagram

GBN Receiver Activity Diagram

GBN sender extended Activity DiagramWaiting for file

Set NSet NextPktToSend=0

Set LastACKed=-1

NextPktToSend ndash LastACKedltN

Send pkt[NextPktToSend] with SeqNum= NextPktToSend NextPktToSend++

Set Timer(NextPktToSend) = Now + TO

Wait

Clear Timers(LastACKed+1 to AN)LastACKed = AN

ACK arrived with ACKNum = AN

otherwise

Clear Timers(LastACKed+1 to NextPktToSend-1)NextPktToSend = LastACKed+1

Timer expires

GBN Receiver Activity Diagramstart

Set NextPktToRec = 0Clear ReceiverBufferClear ReceivedPktsReceiverBase = 0

ReceivedPkts[NextPktToRec] == 1

NextPktToRec++Give ReceiverBuffer[NextPktToRec] to app

Send ACK with ACKNum = NextPktToRec - 1

Place Pkt in ReceiverBuffer[SeqNum]ReceivedPkts[SeqNum]=1

wait

otherwise

GBN sender extended FSM

Wait start_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

GBN receiver extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

GBN in Action

sender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK=0Rec 1 give to app and Send ACK=1Rec 2 give to app and Send ACK=2

Rec 3 give to app and Send ACK=3

Rec 4 give to app and Send ACK=4

Rec 5 give to app and Send ACK=5

Rec 7 discard and Send ACK=5

Rec 8 discard and Send ACK=5

Rec 9 discard and Send ACK=5

Rec 10 discard and Send ACK=5

Rec 11 discard and Send ACK=5

Rec 12 discard and Send ACK=5

Rec 13 discard and Send ACK=5

Rec 6 give to app and Send ACK=6

Rec 7 give to app and Send ACK=7Rec 8 give to app and Send ACK=8

Rec 9 give to app and Send ACK=9

Optimal size of N in GBN

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order

delivery to upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart

timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed

pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]

send ACK(n) out-of-order buffer in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

Selective repeat in action

Summary of transport layer tools used so far

ACK and NACK Sequence numbers (and no NACK) Time out Sliding window

Optimal size = bandwidth delay product (if no other flows are using the network)

Cumulative ACK Buffer at the receiver is optional

Selective ACK Requires buffering at the receiver

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size

connection-oriented handshaking (exchange

of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver

reliable in-order byte steam

Pipelined and time-varying window size TCP congestion and

flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

It can be used as a pointer for placing the received data in the receiver buffer

ACKs seq of next byte

expected from other side

cumulative ACK

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

rdt30 sender

rdt30 sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

rdt_rcv(rcvpkt)

resend pkt1

rdt30 in action

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1

rec ack0

rec pkt1

send ack1rec ack1

send pkt1

rec pkt1

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1rec ack0

send pkt2

rec pkt2

TO

rec pkt1send ack1

rec ack1

rdt30 in action

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1rec pkt1

TO

send ack1

send pkt2rec ack1

send ack1

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1

rec pkt1

TO

send ack1

send pkt

rec ack1

send ack1

send pkt2

rec pkt2rec ack1

send no pkt (dupACK) send ack2rec ack2

send pkt2

Performance of rdt30 rdt30 works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit packet and 100bit ACK

What is the total delaybull Data transmission delay

ndash 8000109 = 810-6

bull ACK Transmission delayndash 100109 = 10-7 sec

bull Total Delay ndash 215ms + 008 + 0001=300081ms

Utilization Time transmitting total time 008 300081 = 000027

This is one pkt every 30msec or 33 kBsec over a 1 Gbps link

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Pipelining Protocols

Go-back-N big picture Sender can have up

to N unacked packets in pipeline

Rcvr only sends cumulative acks Doesnrsquot ack packet if

therersquos a gap Sender has timer for

oldest unacked packet If timer expires

retransmit all unacked packets

Selective Repeat big pic

Sender can have up to N unacked packets in pipeline

Rcvr acks individual packets

Sender maintains timer for each unacked packet When timer expires

retransmit only unack packet

Selective repeat big picture

Sender can have up to N unacked packets in pipeline

Receiver acks individual packets Sender maintains timer for each

unacked packet When timer expires retransmit only unack

packet

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

Go-Back-NPkt that could be sent unACKed pkt

Unused pktACKed pkt

send pkt

window

1 unACKed pkts

startwindowN=12

0 unACKed pkts

pkts

send pkts

windowN unACKed pkts

Next pkt to be sent

ACK arrives

windowN=12

Send pkt

N unACKed pkts

windowN-1 unACKed pkts

Sliding window

State of pkts

Go-Back-N Pkt that could be sent unACKed pkt

Unused pktACKed pkt

window

N unACKed pkts

window

N-1 unACKed pkts

ACK arrives

Send pkt

N unACKed pkts

window

N unACKed pkts

windowNo ACK arrives hellip timeout

0 unACKed pkts

window

GBN sender extended Activity Diagram

GBN Receiver Activity Diagram

GBN sender extended Activity DiagramWaiting for file

Set NSet NextPktToSend=0

Set LastACKed=-1

NextPktToSend ndash LastACKedltN

Send pkt[NextPktToSend] with SeqNum= NextPktToSend NextPktToSend++

Set Timer(NextPktToSend) = Now + TO

Wait

Clear Timers(LastACKed+1 to AN)LastACKed = AN

ACK arrived with ACKNum = AN

otherwise

Clear Timers(LastACKed+1 to NextPktToSend-1)NextPktToSend = LastACKed+1

Timer expires

GBN Receiver Activity Diagramstart

Set NextPktToRec = 0Clear ReceiverBufferClear ReceivedPktsReceiverBase = 0

ReceivedPkts[NextPktToRec] == 1

NextPktToRec++Give ReceiverBuffer[NextPktToRec] to app

Send ACK with ACKNum = NextPktToRec - 1

Place Pkt in ReceiverBuffer[SeqNum]ReceivedPkts[SeqNum]=1

wait

otherwise

GBN sender extended FSM

Wait start_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

GBN receiver extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

GBN in Action

sender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK=0Rec 1 give to app and Send ACK=1Rec 2 give to app and Send ACK=2

Rec 3 give to app and Send ACK=3

Rec 4 give to app and Send ACK=4

Rec 5 give to app and Send ACK=5

Rec 7 discard and Send ACK=5

Rec 8 discard and Send ACK=5

Rec 9 discard and Send ACK=5

Rec 10 discard and Send ACK=5

Rec 11 discard and Send ACK=5

Rec 12 discard and Send ACK=5

Rec 13 discard and Send ACK=5

Rec 6 give to app and Send ACK=6

Rec 7 give to app and Send ACK=7Rec 8 give to app and Send ACK=8

Rec 9 give to app and Send ACK=9

Optimal size of N in GBN

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order

delivery to upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart

timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed

pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]

send ACK(n) out-of-order buffer in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

Selective repeat in action

Summary of transport layer tools used so far

ACK and NACK Sequence numbers (and no NACK) Time out Sliding window

Optimal size = bandwidth delay product (if no other flows are using the network)

Cumulative ACK Buffer at the receiver is optional

Selective ACK Requires buffering at the receiver

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size

connection-oriented handshaking (exchange

of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver

reliable in-order byte steam

Pipelined and time-varying window size TCP congestion and

flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

It can be used as a pointer for placing the received data in the receiver buffer

ACKs seq of next byte

expected from other side

cumulative ACK

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

rdt30 sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

rdt_rcv(rcvpkt)

resend pkt1

rdt30 in action

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1

rec ack0

rec pkt1

send ack1rec ack1

send pkt1

rec pkt1

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1rec ack0

send pkt2

rec pkt2

TO

rec pkt1send ack1

rec ack1

rdt30 in action

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1rec pkt1

TO

send ack1

send pkt2rec ack1

send ack1

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1

rec pkt1

TO

send ack1

send pkt

rec ack1

send ack1

send pkt2

rec pkt2rec ack1

send no pkt (dupACK) send ack2rec ack2

send pkt2

Performance of rdt30 rdt30 works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit packet and 100bit ACK

What is the total delaybull Data transmission delay

ndash 8000109 = 810-6

bull ACK Transmission delayndash 100109 = 10-7 sec

bull Total Delay ndash 215ms + 008 + 0001=300081ms

Utilization Time transmitting total time 008 300081 = 000027

This is one pkt every 30msec or 33 kBsec over a 1 Gbps link

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Pipelining Protocols

Go-back-N big picture Sender can have up

to N unacked packets in pipeline

Rcvr only sends cumulative acks Doesnrsquot ack packet if

therersquos a gap Sender has timer for

oldest unacked packet If timer expires

retransmit all unacked packets

Selective Repeat big pic

Sender can have up to N unacked packets in pipeline

Rcvr acks individual packets

Sender maintains timer for each unacked packet When timer expires

retransmit only unack packet

Selective repeat big picture

Sender can have up to N unacked packets in pipeline

Receiver acks individual packets Sender maintains timer for each

unacked packet When timer expires retransmit only unack

packet

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

Go-Back-NPkt that could be sent unACKed pkt

Unused pktACKed pkt

send pkt

window

1 unACKed pkts

startwindowN=12

0 unACKed pkts

pkts

send pkts

windowN unACKed pkts

Next pkt to be sent

ACK arrives

windowN=12

Send pkt

N unACKed pkts

windowN-1 unACKed pkts

Sliding window

State of pkts

Go-Back-N Pkt that could be sent unACKed pkt

Unused pktACKed pkt

window

N unACKed pkts

window

N-1 unACKed pkts

ACK arrives

Send pkt

N unACKed pkts

window

N unACKed pkts

windowNo ACK arrives hellip timeout

0 unACKed pkts

window

GBN sender extended Activity Diagram

GBN Receiver Activity Diagram

GBN sender extended Activity DiagramWaiting for file

Set NSet NextPktToSend=0

Set LastACKed=-1

NextPktToSend ndash LastACKedltN

Send pkt[NextPktToSend] with SeqNum= NextPktToSend NextPktToSend++

Set Timer(NextPktToSend) = Now + TO

Wait

Clear Timers(LastACKed+1 to AN)LastACKed = AN

ACK arrived with ACKNum = AN

otherwise

Clear Timers(LastACKed+1 to NextPktToSend-1)NextPktToSend = LastACKed+1

Timer expires

GBN Receiver Activity Diagramstart

Set NextPktToRec = 0Clear ReceiverBufferClear ReceivedPktsReceiverBase = 0

ReceivedPkts[NextPktToRec] == 1

NextPktToRec++Give ReceiverBuffer[NextPktToRec] to app

Send ACK with ACKNum = NextPktToRec - 1

Place Pkt in ReceiverBuffer[SeqNum]ReceivedPkts[SeqNum]=1

wait

otherwise

GBN sender extended FSM

Wait start_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

GBN receiver extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

GBN in Action

sender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK=0Rec 1 give to app and Send ACK=1Rec 2 give to app and Send ACK=2

Rec 3 give to app and Send ACK=3

Rec 4 give to app and Send ACK=4

Rec 5 give to app and Send ACK=5

Rec 7 discard and Send ACK=5

Rec 8 discard and Send ACK=5

Rec 9 discard and Send ACK=5

Rec 10 discard and Send ACK=5

Rec 11 discard and Send ACK=5

Rec 12 discard and Send ACK=5

Rec 13 discard and Send ACK=5

Rec 6 give to app and Send ACK=6

Rec 7 give to app and Send ACK=7Rec 8 give to app and Send ACK=8

Rec 9 give to app and Send ACK=9

Optimal size of N in GBN

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order

delivery to upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart

timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed

pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]

send ACK(n) out-of-order buffer in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

Selective repeat in action

Summary of transport layer tools used so far

ACK and NACK Sequence numbers (and no NACK) Time out Sliding window

Optimal size = bandwidth delay product (if no other flows are using the network)

Cumulative ACK Buffer at the receiver is optional

Selective ACK Requires buffering at the receiver

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size

connection-oriented handshaking (exchange

of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver

reliable in-order byte steam

Pipelined and time-varying window size TCP congestion and

flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

It can be used as a pointer for placing the received data in the receiver buffer

ACKs seq of next byte

expected from other side

cumulative ACK

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

resend pkt1

rdt30 in action

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1

rec ack0

rec pkt1

send ack1rec ack1

send pkt1

rec pkt1

sender receiver

send pkt0

time

rec pkt0send ack0

send pkt1rec ack0

send pkt2

rec pkt2

TO

rec pkt1send ack1

rec ack1

rdt30 in action

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1rec pkt1

TO

send ack1

send pkt2rec ack1

send ack1

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1

rec pkt1

TO

send ack1

send pkt

rec ack1

send ack1

send pkt2

rec pkt2rec ack1

send no pkt (dupACK) send ack2rec ack2

send pkt2

Performance of rdt30 rdt30 works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit packet and 100bit ACK

What is the total delaybull Data transmission delay

ndash 8000109 = 810-6

bull ACK Transmission delayndash 100109 = 10-7 sec

bull Total Delay ndash 215ms + 008 + 0001=300081ms

Utilization Time transmitting total time 008 300081 = 000027

This is one pkt every 30msec or 33 kBsec over a 1 Gbps link

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Pipelining Protocols

Go-back-N big picture Sender can have up

to N unacked packets in pipeline

Rcvr only sends cumulative acks Doesnrsquot ack packet if

therersquos a gap Sender has timer for

oldest unacked packet If timer expires

retransmit all unacked packets

Selective Repeat big pic

Sender can have up to N unacked packets in pipeline

Rcvr acks individual packets

Sender maintains timer for each unacked packet When timer expires

retransmit only unack packet

Selective repeat big picture

Sender can have up to N unacked packets in pipeline

Receiver acks individual packets Sender maintains timer for each

unacked packet When timer expires retransmit only unack

packet

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

Go-Back-NPkt that could be sent unACKed pkt

Unused pktACKed pkt

send pkt

window

1 unACKed pkts

startwindowN=12

0 unACKed pkts

pkts

send pkts

windowN unACKed pkts

Next pkt to be sent

ACK arrives

windowN=12

Send pkt

N unACKed pkts

windowN-1 unACKed pkts

Sliding window

State of pkts

Go-Back-N Pkt that could be sent unACKed pkt

Unused pktACKed pkt

window

N unACKed pkts

window

N-1 unACKed pkts

ACK arrives

Send pkt

N unACKed pkts

window

N unACKed pkts

windowNo ACK arrives hellip timeout

0 unACKed pkts

window

GBN sender extended Activity Diagram

GBN Receiver Activity Diagram

GBN sender extended Activity DiagramWaiting for file

Set NSet NextPktToSend=0

Set LastACKed=-1

NextPktToSend ndash LastACKedltN

Send pkt[NextPktToSend] with SeqNum= NextPktToSend NextPktToSend++

Set Timer(NextPktToSend) = Now + TO

Wait

Clear Timers(LastACKed+1 to AN)LastACKed = AN

ACK arrived with ACKNum = AN

otherwise

Clear Timers(LastACKed+1 to NextPktToSend-1)NextPktToSend = LastACKed+1

Timer expires

GBN Receiver Activity Diagramstart

Set NextPktToRec = 0Clear ReceiverBufferClear ReceivedPktsReceiverBase = 0

ReceivedPkts[NextPktToRec] == 1

NextPktToRec++Give ReceiverBuffer[NextPktToRec] to app

Send ACK with ACKNum = NextPktToRec - 1

Place Pkt in ReceiverBuffer[SeqNum]ReceivedPkts[SeqNum]=1

wait

otherwise

GBN sender extended FSM

Wait start_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

GBN receiver extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

GBN in Action

sender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK=0Rec 1 give to app and Send ACK=1Rec 2 give to app and Send ACK=2

Rec 3 give to app and Send ACK=3

Rec 4 give to app and Send ACK=4

Rec 5 give to app and Send ACK=5

Rec 7 discard and Send ACK=5

Rec 8 discard and Send ACK=5

Rec 9 discard and Send ACK=5

Rec 10 discard and Send ACK=5

Rec 11 discard and Send ACK=5

Rec 12 discard and Send ACK=5

Rec 13 discard and Send ACK=5

Rec 6 give to app and Send ACK=6

Rec 7 give to app and Send ACK=7Rec 8 give to app and Send ACK=8

Rec 9 give to app and Send ACK=9

Optimal size of N in GBN

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order

delivery to upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart

timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed

pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]

send ACK(n) out-of-order buffer in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

Selective repeat in action

Summary of transport layer tools used so far

ACK and NACK Sequence numbers (and no NACK) Time out Sliding window

Optimal size = bandwidth delay product (if no other flows are using the network)

Cumulative ACK Buffer at the receiver is optional

Selective ACK Requires buffering at the receiver

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size

connection-oriented handshaking (exchange

of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver

reliable in-order byte steam

Pipelined and time-varying window size TCP congestion and

flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

It can be used as a pointer for placing the received data in the receiver buffer

ACKs seq of next byte

expected from other side

cumulative ACK

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

rdt30 in action

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1rec pkt1

TO

send ack1

send pkt2rec ack1

send ack1

sender receiver

time

send pkt0

rec pkt0send ack0

rec ack0send pkt1

rec pkt1

send pkt1

rec pkt1

TO

send ack1

send pkt

rec ack1

send ack1

send pkt2

rec pkt2rec ack1

send no pkt (dupACK) send ack2rec ack2

send pkt2

Performance of rdt30 rdt30 works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit packet and 100bit ACK

What is the total delaybull Data transmission delay

ndash 8000109 = 810-6

bull ACK Transmission delayndash 100109 = 10-7 sec

bull Total Delay ndash 215ms + 008 + 0001=300081ms

Utilization Time transmitting total time 008 300081 = 000027

This is one pkt every 30msec or 33 kBsec over a 1 Gbps link

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Pipelining Protocols

Go-back-N big picture Sender can have up

to N unacked packets in pipeline

Rcvr only sends cumulative acks Doesnrsquot ack packet if

therersquos a gap Sender has timer for

oldest unacked packet If timer expires

retransmit all unacked packets

Selective Repeat big pic

Sender can have up to N unacked packets in pipeline

Rcvr acks individual packets

Sender maintains timer for each unacked packet When timer expires

retransmit only unack packet

Selective repeat big picture

Sender can have up to N unacked packets in pipeline

Receiver acks individual packets Sender maintains timer for each

unacked packet When timer expires retransmit only unack

packet

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

Go-Back-NPkt that could be sent unACKed pkt

Unused pktACKed pkt

send pkt

window

1 unACKed pkts

startwindowN=12

0 unACKed pkts

pkts

send pkts

windowN unACKed pkts

Next pkt to be sent

ACK arrives

windowN=12

Send pkt

N unACKed pkts

windowN-1 unACKed pkts

Sliding window

State of pkts

Go-Back-N Pkt that could be sent unACKed pkt

Unused pktACKed pkt

window

N unACKed pkts

window

N-1 unACKed pkts

ACK arrives

Send pkt

N unACKed pkts

window

N unACKed pkts

windowNo ACK arrives hellip timeout

0 unACKed pkts

window

GBN sender extended Activity Diagram

GBN Receiver Activity Diagram

GBN sender extended Activity DiagramWaiting for file

Set NSet NextPktToSend=0

Set LastACKed=-1

NextPktToSend ndash LastACKedltN

Send pkt[NextPktToSend] with SeqNum= NextPktToSend NextPktToSend++

Set Timer(NextPktToSend) = Now + TO

Wait

Clear Timers(LastACKed+1 to AN)LastACKed = AN

ACK arrived with ACKNum = AN

otherwise

Clear Timers(LastACKed+1 to NextPktToSend-1)NextPktToSend = LastACKed+1

Timer expires

GBN Receiver Activity Diagramstart

Set NextPktToRec = 0Clear ReceiverBufferClear ReceivedPktsReceiverBase = 0

ReceivedPkts[NextPktToRec] == 1

NextPktToRec++Give ReceiverBuffer[NextPktToRec] to app

Send ACK with ACKNum = NextPktToRec - 1

Place Pkt in ReceiverBuffer[SeqNum]ReceivedPkts[SeqNum]=1

wait

otherwise

GBN sender extended FSM

Wait start_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

GBN receiver extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

GBN in Action

sender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK=0Rec 1 give to app and Send ACK=1Rec 2 give to app and Send ACK=2

Rec 3 give to app and Send ACK=3

Rec 4 give to app and Send ACK=4

Rec 5 give to app and Send ACK=5

Rec 7 discard and Send ACK=5

Rec 8 discard and Send ACK=5

Rec 9 discard and Send ACK=5

Rec 10 discard and Send ACK=5

Rec 11 discard and Send ACK=5

Rec 12 discard and Send ACK=5

Rec 13 discard and Send ACK=5

Rec 6 give to app and Send ACK=6

Rec 7 give to app and Send ACK=7Rec 8 give to app and Send ACK=8

Rec 9 give to app and Send ACK=9

Optimal size of N in GBN

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order

delivery to upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart

timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed

pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]

send ACK(n) out-of-order buffer in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

Selective repeat in action

Summary of transport layer tools used so far

ACK and NACK Sequence numbers (and no NACK) Time out Sliding window

Optimal size = bandwidth delay product (if no other flows are using the network)

Cumulative ACK Buffer at the receiver is optional

Selective ACK Requires buffering at the receiver

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size

connection-oriented handshaking (exchange

of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver

reliable in-order byte steam

Pipelined and time-varying window size TCP congestion and

flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

It can be used as a pointer for placing the received data in the receiver buffer

ACKs seq of next byte

expected from other side

cumulative ACK

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Performance of rdt30 rdt30 works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit packet and 100bit ACK

What is the total delaybull Data transmission delay

ndash 8000109 = 810-6

bull ACK Transmission delayndash 100109 = 10-7 sec

bull Total Delay ndash 215ms + 008 + 0001=300081ms

Utilization Time transmitting total time 008 300081 = 000027

This is one pkt every 30msec or 33 kBsec over a 1 Gbps link

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Pipelining Protocols

Go-back-N big picture Sender can have up

to N unacked packets in pipeline

Rcvr only sends cumulative acks Doesnrsquot ack packet if

therersquos a gap Sender has timer for

oldest unacked packet If timer expires

retransmit all unacked packets

Selective Repeat big pic

Sender can have up to N unacked packets in pipeline

Rcvr acks individual packets

Sender maintains timer for each unacked packet When timer expires

retransmit only unack packet

Selective repeat big picture

Sender can have up to N unacked packets in pipeline

Receiver acks individual packets Sender maintains timer for each

unacked packet When timer expires retransmit only unack

packet

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

Go-Back-NPkt that could be sent unACKed pkt

Unused pktACKed pkt

send pkt

window

1 unACKed pkts

startwindowN=12

0 unACKed pkts

pkts

send pkts

windowN unACKed pkts

Next pkt to be sent

ACK arrives

windowN=12

Send pkt

N unACKed pkts

windowN-1 unACKed pkts

Sliding window

State of pkts

Go-Back-N Pkt that could be sent unACKed pkt

Unused pktACKed pkt

window

N unACKed pkts

window

N-1 unACKed pkts

ACK arrives

Send pkt

N unACKed pkts

window

N unACKed pkts

windowNo ACK arrives hellip timeout

0 unACKed pkts

window

GBN sender extended Activity Diagram

GBN Receiver Activity Diagram

GBN sender extended Activity DiagramWaiting for file

Set NSet NextPktToSend=0

Set LastACKed=-1

NextPktToSend ndash LastACKedltN

Send pkt[NextPktToSend] with SeqNum= NextPktToSend NextPktToSend++

Set Timer(NextPktToSend) = Now + TO

Wait

Clear Timers(LastACKed+1 to AN)LastACKed = AN

ACK arrived with ACKNum = AN

otherwise

Clear Timers(LastACKed+1 to NextPktToSend-1)NextPktToSend = LastACKed+1

Timer expires

GBN Receiver Activity Diagramstart

Set NextPktToRec = 0Clear ReceiverBufferClear ReceivedPktsReceiverBase = 0

ReceivedPkts[NextPktToRec] == 1

NextPktToRec++Give ReceiverBuffer[NextPktToRec] to app

Send ACK with ACKNum = NextPktToRec - 1

Place Pkt in ReceiverBuffer[SeqNum]ReceivedPkts[SeqNum]=1

wait

otherwise

GBN sender extended FSM

Wait start_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

GBN receiver extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

GBN in Action

sender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK=0Rec 1 give to app and Send ACK=1Rec 2 give to app and Send ACK=2

Rec 3 give to app and Send ACK=3

Rec 4 give to app and Send ACK=4

Rec 5 give to app and Send ACK=5

Rec 7 discard and Send ACK=5

Rec 8 discard and Send ACK=5

Rec 9 discard and Send ACK=5

Rec 10 discard and Send ACK=5

Rec 11 discard and Send ACK=5

Rec 12 discard and Send ACK=5

Rec 13 discard and Send ACK=5

Rec 6 give to app and Send ACK=6

Rec 7 give to app and Send ACK=7Rec 8 give to app and Send ACK=8

Rec 9 give to app and Send ACK=9

Optimal size of N in GBN

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order

delivery to upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart

timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed

pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]

send ACK(n) out-of-order buffer in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

Selective repeat in action

Summary of transport layer tools used so far

ACK and NACK Sequence numbers (and no NACK) Time out Sliding window

Optimal size = bandwidth delay product (if no other flows are using the network)

Cumulative ACK Buffer at the receiver is optional

Selective ACK Requires buffering at the receiver

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size

connection-oriented handshaking (exchange

of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver

reliable in-order byte steam

Pipelined and time-varying window size TCP congestion and

flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

It can be used as a pointer for placing the received data in the receiver buffer

ACKs seq of next byte

expected from other side

cumulative ACK

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

rdt30 stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Pipelining Protocols

Go-back-N big picture Sender can have up

to N unacked packets in pipeline

Rcvr only sends cumulative acks Doesnrsquot ack packet if

therersquos a gap Sender has timer for

oldest unacked packet If timer expires

retransmit all unacked packets

Selective Repeat big pic

Sender can have up to N unacked packets in pipeline

Rcvr acks individual packets

Sender maintains timer for each unacked packet When timer expires

retransmit only unack packet

Selective repeat big picture

Sender can have up to N unacked packets in pipeline

Receiver acks individual packets Sender maintains timer for each

unacked packet When timer expires retransmit only unack

packet

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

Go-Back-NPkt that could be sent unACKed pkt

Unused pktACKed pkt

send pkt

window

1 unACKed pkts

startwindowN=12

0 unACKed pkts

pkts

send pkts

windowN unACKed pkts

Next pkt to be sent

ACK arrives

windowN=12

Send pkt

N unACKed pkts

windowN-1 unACKed pkts

Sliding window

State of pkts

Go-Back-N Pkt that could be sent unACKed pkt

Unused pktACKed pkt

window

N unACKed pkts

window

N-1 unACKed pkts

ACK arrives

Send pkt

N unACKed pkts

window

N unACKed pkts

windowNo ACK arrives hellip timeout

0 unACKed pkts

window

GBN sender extended Activity Diagram

GBN Receiver Activity Diagram

GBN sender extended Activity DiagramWaiting for file

Set NSet NextPktToSend=0

Set LastACKed=-1

NextPktToSend ndash LastACKedltN

Send pkt[NextPktToSend] with SeqNum= NextPktToSend NextPktToSend++

Set Timer(NextPktToSend) = Now + TO

Wait

Clear Timers(LastACKed+1 to AN)LastACKed = AN

ACK arrived with ACKNum = AN

otherwise

Clear Timers(LastACKed+1 to NextPktToSend-1)NextPktToSend = LastACKed+1

Timer expires

GBN Receiver Activity Diagramstart

Set NextPktToRec = 0Clear ReceiverBufferClear ReceivedPktsReceiverBase = 0

ReceivedPkts[NextPktToRec] == 1

NextPktToRec++Give ReceiverBuffer[NextPktToRec] to app

Send ACK with ACKNum = NextPktToRec - 1

Place Pkt in ReceiverBuffer[SeqNum]ReceivedPkts[SeqNum]=1

wait

otherwise

GBN sender extended FSM

Wait start_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

GBN receiver extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

GBN in Action

sender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK=0Rec 1 give to app and Send ACK=1Rec 2 give to app and Send ACK=2

Rec 3 give to app and Send ACK=3

Rec 4 give to app and Send ACK=4

Rec 5 give to app and Send ACK=5

Rec 7 discard and Send ACK=5

Rec 8 discard and Send ACK=5

Rec 9 discard and Send ACK=5

Rec 10 discard and Send ACK=5

Rec 11 discard and Send ACK=5

Rec 12 discard and Send ACK=5

Rec 13 discard and Send ACK=5

Rec 6 give to app and Send ACK=6

Rec 7 give to app and Send ACK=7Rec 8 give to app and Send ACK=8

Rec 9 give to app and Send ACK=9

Optimal size of N in GBN

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order

delivery to upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart

timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed

pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]

send ACK(n) out-of-order buffer in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

Selective repeat in action

Summary of transport layer tools used so far

ACK and NACK Sequence numbers (and no NACK) Time out Sliding window

Optimal size = bandwidth delay product (if no other flows are using the network)

Cumulative ACK Buffer at the receiver is optional

Selective ACK Requires buffering at the receiver

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size

connection-oriented handshaking (exchange

of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver

reliable in-order byte steam

Pipelined and time-varying window size TCP congestion and

flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

It can be used as a pointer for placing the received data in the receiver buffer

ACKs seq of next byte

expected from other side

cumulative ACK

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Pipelining Protocols

Go-back-N big picture Sender can have up

to N unacked packets in pipeline

Rcvr only sends cumulative acks Doesnrsquot ack packet if

therersquos a gap Sender has timer for

oldest unacked packet If timer expires

retransmit all unacked packets

Selective Repeat big pic

Sender can have up to N unacked packets in pipeline

Rcvr acks individual packets

Sender maintains timer for each unacked packet When timer expires

retransmit only unack packet

Selective repeat big picture

Sender can have up to N unacked packets in pipeline

Receiver acks individual packets Sender maintains timer for each

unacked packet When timer expires retransmit only unack

packet

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

Go-Back-NPkt that could be sent unACKed pkt

Unused pktACKed pkt

send pkt

window

1 unACKed pkts

startwindowN=12

0 unACKed pkts

pkts

send pkts

windowN unACKed pkts

Next pkt to be sent

ACK arrives

windowN=12

Send pkt

N unACKed pkts

windowN-1 unACKed pkts

Sliding window

State of pkts

Go-Back-N Pkt that could be sent unACKed pkt

Unused pktACKed pkt

window

N unACKed pkts

window

N-1 unACKed pkts

ACK arrives

Send pkt

N unACKed pkts

window

N unACKed pkts

windowNo ACK arrives hellip timeout

0 unACKed pkts

window

GBN sender extended Activity Diagram

GBN Receiver Activity Diagram

GBN sender extended Activity DiagramWaiting for file

Set NSet NextPktToSend=0

Set LastACKed=-1

NextPktToSend ndash LastACKedltN

Send pkt[NextPktToSend] with SeqNum= NextPktToSend NextPktToSend++

Set Timer(NextPktToSend) = Now + TO

Wait

Clear Timers(LastACKed+1 to AN)LastACKed = AN

ACK arrived with ACKNum = AN

otherwise

Clear Timers(LastACKed+1 to NextPktToSend-1)NextPktToSend = LastACKed+1

Timer expires

GBN Receiver Activity Diagramstart

Set NextPktToRec = 0Clear ReceiverBufferClear ReceivedPktsReceiverBase = 0

ReceivedPkts[NextPktToRec] == 1

NextPktToRec++Give ReceiverBuffer[NextPktToRec] to app

Send ACK with ACKNum = NextPktToRec - 1

Place Pkt in ReceiverBuffer[SeqNum]ReceivedPkts[SeqNum]=1

wait

otherwise

GBN sender extended FSM

Wait start_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

GBN receiver extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

GBN in Action

sender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK=0Rec 1 give to app and Send ACK=1Rec 2 give to app and Send ACK=2

Rec 3 give to app and Send ACK=3

Rec 4 give to app and Send ACK=4

Rec 5 give to app and Send ACK=5

Rec 7 discard and Send ACK=5

Rec 8 discard and Send ACK=5

Rec 9 discard and Send ACK=5

Rec 10 discard and Send ACK=5

Rec 11 discard and Send ACK=5

Rec 12 discard and Send ACK=5

Rec 13 discard and Send ACK=5

Rec 6 give to app and Send ACK=6

Rec 7 give to app and Send ACK=7Rec 8 give to app and Send ACK=8

Rec 9 give to app and Send ACK=9

Optimal size of N in GBN

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order

delivery to upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart

timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed

pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]

send ACK(n) out-of-order buffer in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

Selective repeat in action

Summary of transport layer tools used so far

ACK and NACK Sequence numbers (and no NACK) Time out Sliding window

Optimal size = bandwidth delay product (if no other flows are using the network)

Cumulative ACK Buffer at the receiver is optional

Selective ACK Requires buffering at the receiver

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size

connection-oriented handshaking (exchange

of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver

reliable in-order byte steam

Pipelined and time-varying window size TCP congestion and

flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

It can be used as a pointer for placing the received data in the receiver buffer

ACKs seq of next byte

expected from other side

cumulative ACK

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Pipelining Protocols

Go-back-N big picture Sender can have up

to N unacked packets in pipeline

Rcvr only sends cumulative acks Doesnrsquot ack packet if

therersquos a gap Sender has timer for

oldest unacked packet If timer expires

retransmit all unacked packets

Selective Repeat big pic

Sender can have up to N unacked packets in pipeline

Rcvr acks individual packets

Sender maintains timer for each unacked packet When timer expires

retransmit only unack packet

Selective repeat big picture

Sender can have up to N unacked packets in pipeline

Receiver acks individual packets Sender maintains timer for each

unacked packet When timer expires retransmit only unack

packet

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

Go-Back-NPkt that could be sent unACKed pkt

Unused pktACKed pkt

send pkt

window

1 unACKed pkts

startwindowN=12

0 unACKed pkts

pkts

send pkts

windowN unACKed pkts

Next pkt to be sent

ACK arrives

windowN=12

Send pkt

N unACKed pkts

windowN-1 unACKed pkts

Sliding window

State of pkts

Go-Back-N Pkt that could be sent unACKed pkt

Unused pktACKed pkt

window

N unACKed pkts

window

N-1 unACKed pkts

ACK arrives

Send pkt

N unACKed pkts

window

N unACKed pkts

windowNo ACK arrives hellip timeout

0 unACKed pkts

window

GBN sender extended Activity Diagram

GBN Receiver Activity Diagram

GBN sender extended Activity DiagramWaiting for file

Set NSet NextPktToSend=0

Set LastACKed=-1

NextPktToSend ndash LastACKedltN

Send pkt[NextPktToSend] with SeqNum= NextPktToSend NextPktToSend++

Set Timer(NextPktToSend) = Now + TO

Wait

Clear Timers(LastACKed+1 to AN)LastACKed = AN

ACK arrived with ACKNum = AN

otherwise

Clear Timers(LastACKed+1 to NextPktToSend-1)NextPktToSend = LastACKed+1

Timer expires

GBN Receiver Activity Diagramstart

Set NextPktToRec = 0Clear ReceiverBufferClear ReceivedPktsReceiverBase = 0

ReceivedPkts[NextPktToRec] == 1

NextPktToRec++Give ReceiverBuffer[NextPktToRec] to app

Send ACK with ACKNum = NextPktToRec - 1

Place Pkt in ReceiverBuffer[SeqNum]ReceivedPkts[SeqNum]=1

wait

otherwise

GBN sender extended FSM

Wait start_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

GBN receiver extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

GBN in Action

sender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK=0Rec 1 give to app and Send ACK=1Rec 2 give to app and Send ACK=2

Rec 3 give to app and Send ACK=3

Rec 4 give to app and Send ACK=4

Rec 5 give to app and Send ACK=5

Rec 7 discard and Send ACK=5

Rec 8 discard and Send ACK=5

Rec 9 discard and Send ACK=5

Rec 10 discard and Send ACK=5

Rec 11 discard and Send ACK=5

Rec 12 discard and Send ACK=5

Rec 13 discard and Send ACK=5

Rec 6 give to app and Send ACK=6

Rec 7 give to app and Send ACK=7Rec 8 give to app and Send ACK=8

Rec 9 give to app and Send ACK=9

Optimal size of N in GBN

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order

delivery to upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart

timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed

pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]

send ACK(n) out-of-order buffer in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

Selective repeat in action

Summary of transport layer tools used so far

ACK and NACK Sequence numbers (and no NACK) Time out Sliding window

Optimal size = bandwidth delay product (if no other flows are using the network)

Cumulative ACK Buffer at the receiver is optional

Selective ACK Requires buffering at the receiver

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size

connection-oriented handshaking (exchange

of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver

reliable in-order byte steam

Pipelined and time-varying window size TCP congestion and

flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

It can be used as a pointer for placing the received data in the receiver buffer

ACKs seq of next byte

expected from other side

cumulative ACK

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Pipelining Protocols

Go-back-N big picture Sender can have up

to N unacked packets in pipeline

Rcvr only sends cumulative acks Doesnrsquot ack packet if

therersquos a gap Sender has timer for

oldest unacked packet If timer expires

retransmit all unacked packets

Selective Repeat big pic

Sender can have up to N unacked packets in pipeline

Rcvr acks individual packets

Sender maintains timer for each unacked packet When timer expires

retransmit only unack packet

Selective repeat big picture

Sender can have up to N unacked packets in pipeline

Receiver acks individual packets Sender maintains timer for each

unacked packet When timer expires retransmit only unack

packet

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

Go-Back-NPkt that could be sent unACKed pkt

Unused pktACKed pkt

send pkt

window

1 unACKed pkts

startwindowN=12

0 unACKed pkts

pkts

send pkts

windowN unACKed pkts

Next pkt to be sent

ACK arrives

windowN=12

Send pkt

N unACKed pkts

windowN-1 unACKed pkts

Sliding window

State of pkts

Go-Back-N Pkt that could be sent unACKed pkt

Unused pktACKed pkt

window

N unACKed pkts

window

N-1 unACKed pkts

ACK arrives

Send pkt

N unACKed pkts

window

N unACKed pkts

windowNo ACK arrives hellip timeout

0 unACKed pkts

window

GBN sender extended Activity Diagram

GBN Receiver Activity Diagram

GBN sender extended Activity DiagramWaiting for file

Set NSet NextPktToSend=0

Set LastACKed=-1

NextPktToSend ndash LastACKedltN

Send pkt[NextPktToSend] with SeqNum= NextPktToSend NextPktToSend++

Set Timer(NextPktToSend) = Now + TO

Wait

Clear Timers(LastACKed+1 to AN)LastACKed = AN

ACK arrived with ACKNum = AN

otherwise

Clear Timers(LastACKed+1 to NextPktToSend-1)NextPktToSend = LastACKed+1

Timer expires

GBN Receiver Activity Diagramstart

Set NextPktToRec = 0Clear ReceiverBufferClear ReceivedPktsReceiverBase = 0

ReceivedPkts[NextPktToRec] == 1

NextPktToRec++Give ReceiverBuffer[NextPktToRec] to app

Send ACK with ACKNum = NextPktToRec - 1

Place Pkt in ReceiverBuffer[SeqNum]ReceivedPkts[SeqNum]=1

wait

otherwise

GBN sender extended FSM

Wait start_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

GBN receiver extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

GBN in Action

sender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK=0Rec 1 give to app and Send ACK=1Rec 2 give to app and Send ACK=2

Rec 3 give to app and Send ACK=3

Rec 4 give to app and Send ACK=4

Rec 5 give to app and Send ACK=5

Rec 7 discard and Send ACK=5

Rec 8 discard and Send ACK=5

Rec 9 discard and Send ACK=5

Rec 10 discard and Send ACK=5

Rec 11 discard and Send ACK=5

Rec 12 discard and Send ACK=5

Rec 13 discard and Send ACK=5

Rec 6 give to app and Send ACK=6

Rec 7 give to app and Send ACK=7Rec 8 give to app and Send ACK=8

Rec 9 give to app and Send ACK=9

Optimal size of N in GBN

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order

delivery to upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart

timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed

pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]

send ACK(n) out-of-order buffer in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

Selective repeat in action

Summary of transport layer tools used so far

ACK and NACK Sequence numbers (and no NACK) Time out Sliding window

Optimal size = bandwidth delay product (if no other flows are using the network)

Cumulative ACK Buffer at the receiver is optional

Selective ACK Requires buffering at the receiver

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size

connection-oriented handshaking (exchange

of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver

reliable in-order byte steam

Pipelined and time-varying window size TCP congestion and

flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

It can be used as a pointer for placing the received data in the receiver buffer

ACKs seq of next byte

expected from other side

cumulative ACK

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Selective repeat big picture

Sender can have up to N unacked packets in pipeline

Receiver acks individual packets Sender maintains timer for each

unacked packet When timer expires retransmit only unack

packet

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

Go-Back-NPkt that could be sent unACKed pkt

Unused pktACKed pkt

send pkt

window

1 unACKed pkts

startwindowN=12

0 unACKed pkts

pkts

send pkts

windowN unACKed pkts

Next pkt to be sent

ACK arrives

windowN=12

Send pkt

N unACKed pkts

windowN-1 unACKed pkts

Sliding window

State of pkts

Go-Back-N Pkt that could be sent unACKed pkt

Unused pktACKed pkt

window

N unACKed pkts

window

N-1 unACKed pkts

ACK arrives

Send pkt

N unACKed pkts

window

N unACKed pkts

windowNo ACK arrives hellip timeout

0 unACKed pkts

window

GBN sender extended Activity Diagram

GBN Receiver Activity Diagram

GBN sender extended Activity DiagramWaiting for file

Set NSet NextPktToSend=0

Set LastACKed=-1

NextPktToSend ndash LastACKedltN

Send pkt[NextPktToSend] with SeqNum= NextPktToSend NextPktToSend++

Set Timer(NextPktToSend) = Now + TO

Wait

Clear Timers(LastACKed+1 to AN)LastACKed = AN

ACK arrived with ACKNum = AN

otherwise

Clear Timers(LastACKed+1 to NextPktToSend-1)NextPktToSend = LastACKed+1

Timer expires

GBN Receiver Activity Diagramstart

Set NextPktToRec = 0Clear ReceiverBufferClear ReceivedPktsReceiverBase = 0

ReceivedPkts[NextPktToRec] == 1

NextPktToRec++Give ReceiverBuffer[NextPktToRec] to app

Send ACK with ACKNum = NextPktToRec - 1

Place Pkt in ReceiverBuffer[SeqNum]ReceivedPkts[SeqNum]=1

wait

otherwise

GBN sender extended FSM

Wait start_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

GBN receiver extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

GBN in Action

sender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK=0Rec 1 give to app and Send ACK=1Rec 2 give to app and Send ACK=2

Rec 3 give to app and Send ACK=3

Rec 4 give to app and Send ACK=4

Rec 5 give to app and Send ACK=5

Rec 7 discard and Send ACK=5

Rec 8 discard and Send ACK=5

Rec 9 discard and Send ACK=5

Rec 10 discard and Send ACK=5

Rec 11 discard and Send ACK=5

Rec 12 discard and Send ACK=5

Rec 13 discard and Send ACK=5

Rec 6 give to app and Send ACK=6

Rec 7 give to app and Send ACK=7Rec 8 give to app and Send ACK=8

Rec 9 give to app and Send ACK=9

Optimal size of N in GBN

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order

delivery to upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart

timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed

pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]

send ACK(n) out-of-order buffer in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

Selective repeat in action

Summary of transport layer tools used so far

ACK and NACK Sequence numbers (and no NACK) Time out Sliding window

Optimal size = bandwidth delay product (if no other flows are using the network)

Cumulative ACK Buffer at the receiver is optional

Selective ACK Requires buffering at the receiver

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size

connection-oriented handshaking (exchange

of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver

reliable in-order byte steam

Pipelined and time-varying window size TCP congestion and

flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

It can be used as a pointer for placing the received data in the receiver buffer

ACKs seq of next byte

expected from other side

cumulative ACK

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in window

Go-Back-NPkt that could be sent unACKed pkt

Unused pktACKed pkt

send pkt

window

1 unACKed pkts

startwindowN=12

0 unACKed pkts

pkts

send pkts

windowN unACKed pkts

Next pkt to be sent

ACK arrives

windowN=12

Send pkt

N unACKed pkts

windowN-1 unACKed pkts

Sliding window

State of pkts

Go-Back-N Pkt that could be sent unACKed pkt

Unused pktACKed pkt

window

N unACKed pkts

window

N-1 unACKed pkts

ACK arrives

Send pkt

N unACKed pkts

window

N unACKed pkts

windowNo ACK arrives hellip timeout

0 unACKed pkts

window

GBN sender extended Activity Diagram

GBN Receiver Activity Diagram

GBN sender extended Activity DiagramWaiting for file

Set NSet NextPktToSend=0

Set LastACKed=-1

NextPktToSend ndash LastACKedltN

Send pkt[NextPktToSend] with SeqNum= NextPktToSend NextPktToSend++

Set Timer(NextPktToSend) = Now + TO

Wait

Clear Timers(LastACKed+1 to AN)LastACKed = AN

ACK arrived with ACKNum = AN

otherwise

Clear Timers(LastACKed+1 to NextPktToSend-1)NextPktToSend = LastACKed+1

Timer expires

GBN Receiver Activity Diagramstart

Set NextPktToRec = 0Clear ReceiverBufferClear ReceivedPktsReceiverBase = 0

ReceivedPkts[NextPktToRec] == 1

NextPktToRec++Give ReceiverBuffer[NextPktToRec] to app

Send ACK with ACKNum = NextPktToRec - 1

Place Pkt in ReceiverBuffer[SeqNum]ReceivedPkts[SeqNum]=1

wait

otherwise

GBN sender extended FSM

Wait start_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

GBN receiver extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

GBN in Action

sender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK=0Rec 1 give to app and Send ACK=1Rec 2 give to app and Send ACK=2

Rec 3 give to app and Send ACK=3

Rec 4 give to app and Send ACK=4

Rec 5 give to app and Send ACK=5

Rec 7 discard and Send ACK=5

Rec 8 discard and Send ACK=5

Rec 9 discard and Send ACK=5

Rec 10 discard and Send ACK=5

Rec 11 discard and Send ACK=5

Rec 12 discard and Send ACK=5

Rec 13 discard and Send ACK=5

Rec 6 give to app and Send ACK=6

Rec 7 give to app and Send ACK=7Rec 8 give to app and Send ACK=8

Rec 9 give to app and Send ACK=9

Optimal size of N in GBN

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order

delivery to upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart

timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed

pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]

send ACK(n) out-of-order buffer in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

Selective repeat in action

Summary of transport layer tools used so far

ACK and NACK Sequence numbers (and no NACK) Time out Sliding window

Optimal size = bandwidth delay product (if no other flows are using the network)

Cumulative ACK Buffer at the receiver is optional

Selective ACK Requires buffering at the receiver

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size

connection-oriented handshaking (exchange

of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver

reliable in-order byte steam

Pipelined and time-varying window size TCP congestion and

flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

It can be used as a pointer for placing the received data in the receiver buffer

ACKs seq of next byte

expected from other side

cumulative ACK

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Go-Back-NPkt that could be sent unACKed pkt

Unused pktACKed pkt

send pkt

window

1 unACKed pkts

startwindowN=12

0 unACKed pkts

pkts

send pkts

windowN unACKed pkts

Next pkt to be sent

ACK arrives

windowN=12

Send pkt

N unACKed pkts

windowN-1 unACKed pkts

Sliding window

State of pkts

Go-Back-N Pkt that could be sent unACKed pkt

Unused pktACKed pkt

window

N unACKed pkts

window

N-1 unACKed pkts

ACK arrives

Send pkt

N unACKed pkts

window

N unACKed pkts

windowNo ACK arrives hellip timeout

0 unACKed pkts

window

GBN sender extended Activity Diagram

GBN Receiver Activity Diagram

GBN sender extended Activity DiagramWaiting for file

Set NSet NextPktToSend=0

Set LastACKed=-1

NextPktToSend ndash LastACKedltN

Send pkt[NextPktToSend] with SeqNum= NextPktToSend NextPktToSend++

Set Timer(NextPktToSend) = Now + TO

Wait

Clear Timers(LastACKed+1 to AN)LastACKed = AN

ACK arrived with ACKNum = AN

otherwise

Clear Timers(LastACKed+1 to NextPktToSend-1)NextPktToSend = LastACKed+1

Timer expires

GBN Receiver Activity Diagramstart

Set NextPktToRec = 0Clear ReceiverBufferClear ReceivedPktsReceiverBase = 0

ReceivedPkts[NextPktToRec] == 1

NextPktToRec++Give ReceiverBuffer[NextPktToRec] to app

Send ACK with ACKNum = NextPktToRec - 1

Place Pkt in ReceiverBuffer[SeqNum]ReceivedPkts[SeqNum]=1

wait

otherwise

GBN sender extended FSM

Wait start_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

GBN receiver extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

GBN in Action

sender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK=0Rec 1 give to app and Send ACK=1Rec 2 give to app and Send ACK=2

Rec 3 give to app and Send ACK=3

Rec 4 give to app and Send ACK=4

Rec 5 give to app and Send ACK=5

Rec 7 discard and Send ACK=5

Rec 8 discard and Send ACK=5

Rec 9 discard and Send ACK=5

Rec 10 discard and Send ACK=5

Rec 11 discard and Send ACK=5

Rec 12 discard and Send ACK=5

Rec 13 discard and Send ACK=5

Rec 6 give to app and Send ACK=6

Rec 7 give to app and Send ACK=7Rec 8 give to app and Send ACK=8

Rec 9 give to app and Send ACK=9

Optimal size of N in GBN

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order

delivery to upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart

timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed

pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]

send ACK(n) out-of-order buffer in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

Selective repeat in action

Summary of transport layer tools used so far

ACK and NACK Sequence numbers (and no NACK) Time out Sliding window

Optimal size = bandwidth delay product (if no other flows are using the network)

Cumulative ACK Buffer at the receiver is optional

Selective ACK Requires buffering at the receiver

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size

connection-oriented handshaking (exchange

of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver

reliable in-order byte steam

Pipelined and time-varying window size TCP congestion and

flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

It can be used as a pointer for placing the received data in the receiver buffer

ACKs seq of next byte

expected from other side

cumulative ACK

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Go-Back-N Pkt that could be sent unACKed pkt

Unused pktACKed pkt

window

N unACKed pkts

window

N-1 unACKed pkts

ACK arrives

Send pkt

N unACKed pkts

window

N unACKed pkts

windowNo ACK arrives hellip timeout

0 unACKed pkts

window

GBN sender extended Activity Diagram

GBN Receiver Activity Diagram

GBN sender extended Activity DiagramWaiting for file

Set NSet NextPktToSend=0

Set LastACKed=-1

NextPktToSend ndash LastACKedltN

Send pkt[NextPktToSend] with SeqNum= NextPktToSend NextPktToSend++

Set Timer(NextPktToSend) = Now + TO

Wait

Clear Timers(LastACKed+1 to AN)LastACKed = AN

ACK arrived with ACKNum = AN

otherwise

Clear Timers(LastACKed+1 to NextPktToSend-1)NextPktToSend = LastACKed+1

Timer expires

GBN Receiver Activity Diagramstart

Set NextPktToRec = 0Clear ReceiverBufferClear ReceivedPktsReceiverBase = 0

ReceivedPkts[NextPktToRec] == 1

NextPktToRec++Give ReceiverBuffer[NextPktToRec] to app

Send ACK with ACKNum = NextPktToRec - 1

Place Pkt in ReceiverBuffer[SeqNum]ReceivedPkts[SeqNum]=1

wait

otherwise

GBN sender extended FSM

Wait start_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

GBN receiver extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

GBN in Action

sender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK=0Rec 1 give to app and Send ACK=1Rec 2 give to app and Send ACK=2

Rec 3 give to app and Send ACK=3

Rec 4 give to app and Send ACK=4

Rec 5 give to app and Send ACK=5

Rec 7 discard and Send ACK=5

Rec 8 discard and Send ACK=5

Rec 9 discard and Send ACK=5

Rec 10 discard and Send ACK=5

Rec 11 discard and Send ACK=5

Rec 12 discard and Send ACK=5

Rec 13 discard and Send ACK=5

Rec 6 give to app and Send ACK=6

Rec 7 give to app and Send ACK=7Rec 8 give to app and Send ACK=8

Rec 9 give to app and Send ACK=9

Optimal size of N in GBN

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order

delivery to upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart

timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed

pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]

send ACK(n) out-of-order buffer in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

Selective repeat in action

Summary of transport layer tools used so far

ACK and NACK Sequence numbers (and no NACK) Time out Sliding window

Optimal size = bandwidth delay product (if no other flows are using the network)

Cumulative ACK Buffer at the receiver is optional

Selective ACK Requires buffering at the receiver

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size

connection-oriented handshaking (exchange

of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver

reliable in-order byte steam

Pipelined and time-varying window size TCP congestion and

flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

It can be used as a pointer for placing the received data in the receiver buffer

ACKs seq of next byte

expected from other side

cumulative ACK

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

GBN sender extended Activity Diagram

GBN Receiver Activity Diagram

GBN sender extended Activity DiagramWaiting for file

Set NSet NextPktToSend=0

Set LastACKed=-1

NextPktToSend ndash LastACKedltN

Send pkt[NextPktToSend] with SeqNum= NextPktToSend NextPktToSend++

Set Timer(NextPktToSend) = Now + TO

Wait

Clear Timers(LastACKed+1 to AN)LastACKed = AN

ACK arrived with ACKNum = AN

otherwise

Clear Timers(LastACKed+1 to NextPktToSend-1)NextPktToSend = LastACKed+1

Timer expires

GBN Receiver Activity Diagramstart

Set NextPktToRec = 0Clear ReceiverBufferClear ReceivedPktsReceiverBase = 0

ReceivedPkts[NextPktToRec] == 1

NextPktToRec++Give ReceiverBuffer[NextPktToRec] to app

Send ACK with ACKNum = NextPktToRec - 1

Place Pkt in ReceiverBuffer[SeqNum]ReceivedPkts[SeqNum]=1

wait

otherwise

GBN sender extended FSM

Wait start_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

GBN receiver extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

GBN in Action

sender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK=0Rec 1 give to app and Send ACK=1Rec 2 give to app and Send ACK=2

Rec 3 give to app and Send ACK=3

Rec 4 give to app and Send ACK=4

Rec 5 give to app and Send ACK=5

Rec 7 discard and Send ACK=5

Rec 8 discard and Send ACK=5

Rec 9 discard and Send ACK=5

Rec 10 discard and Send ACK=5

Rec 11 discard and Send ACK=5

Rec 12 discard and Send ACK=5

Rec 13 discard and Send ACK=5

Rec 6 give to app and Send ACK=6

Rec 7 give to app and Send ACK=7Rec 8 give to app and Send ACK=8

Rec 9 give to app and Send ACK=9

Optimal size of N in GBN

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order

delivery to upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart

timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed

pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]

send ACK(n) out-of-order buffer in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

Selective repeat in action

Summary of transport layer tools used so far

ACK and NACK Sequence numbers (and no NACK) Time out Sliding window

Optimal size = bandwidth delay product (if no other flows are using the network)

Cumulative ACK Buffer at the receiver is optional

Selective ACK Requires buffering at the receiver

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size

connection-oriented handshaking (exchange

of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver

reliable in-order byte steam

Pipelined and time-varying window size TCP congestion and

flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

It can be used as a pointer for placing the received data in the receiver buffer

ACKs seq of next byte

expected from other side

cumulative ACK

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

GBN Receiver Activity Diagram

GBN sender extended Activity DiagramWaiting for file

Set NSet NextPktToSend=0

Set LastACKed=-1

NextPktToSend ndash LastACKedltN

Send pkt[NextPktToSend] with SeqNum= NextPktToSend NextPktToSend++

Set Timer(NextPktToSend) = Now + TO

Wait

Clear Timers(LastACKed+1 to AN)LastACKed = AN

ACK arrived with ACKNum = AN

otherwise

Clear Timers(LastACKed+1 to NextPktToSend-1)NextPktToSend = LastACKed+1

Timer expires

GBN Receiver Activity Diagramstart

Set NextPktToRec = 0Clear ReceiverBufferClear ReceivedPktsReceiverBase = 0

ReceivedPkts[NextPktToRec] == 1

NextPktToRec++Give ReceiverBuffer[NextPktToRec] to app

Send ACK with ACKNum = NextPktToRec - 1

Place Pkt in ReceiverBuffer[SeqNum]ReceivedPkts[SeqNum]=1

wait

otherwise

GBN sender extended FSM

Wait start_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

GBN receiver extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

GBN in Action

sender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK=0Rec 1 give to app and Send ACK=1Rec 2 give to app and Send ACK=2

Rec 3 give to app and Send ACK=3

Rec 4 give to app and Send ACK=4

Rec 5 give to app and Send ACK=5

Rec 7 discard and Send ACK=5

Rec 8 discard and Send ACK=5

Rec 9 discard and Send ACK=5

Rec 10 discard and Send ACK=5

Rec 11 discard and Send ACK=5

Rec 12 discard and Send ACK=5

Rec 13 discard and Send ACK=5

Rec 6 give to app and Send ACK=6

Rec 7 give to app and Send ACK=7Rec 8 give to app and Send ACK=8

Rec 9 give to app and Send ACK=9

Optimal size of N in GBN

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order

delivery to upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart

timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed

pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]

send ACK(n) out-of-order buffer in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

Selective repeat in action

Summary of transport layer tools used so far

ACK and NACK Sequence numbers (and no NACK) Time out Sliding window

Optimal size = bandwidth delay product (if no other flows are using the network)

Cumulative ACK Buffer at the receiver is optional

Selective ACK Requires buffering at the receiver

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size

connection-oriented handshaking (exchange

of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver

reliable in-order byte steam

Pipelined and time-varying window size TCP congestion and

flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

It can be used as a pointer for placing the received data in the receiver buffer

ACKs seq of next byte

expected from other side

cumulative ACK

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

GBN sender extended Activity DiagramWaiting for file

Set NSet NextPktToSend=0

Set LastACKed=-1

NextPktToSend ndash LastACKedltN

Send pkt[NextPktToSend] with SeqNum= NextPktToSend NextPktToSend++

Set Timer(NextPktToSend) = Now + TO

Wait

Clear Timers(LastACKed+1 to AN)LastACKed = AN

ACK arrived with ACKNum = AN

otherwise

Clear Timers(LastACKed+1 to NextPktToSend-1)NextPktToSend = LastACKed+1

Timer expires

GBN Receiver Activity Diagramstart

Set NextPktToRec = 0Clear ReceiverBufferClear ReceivedPktsReceiverBase = 0

ReceivedPkts[NextPktToRec] == 1

NextPktToRec++Give ReceiverBuffer[NextPktToRec] to app

Send ACK with ACKNum = NextPktToRec - 1

Place Pkt in ReceiverBuffer[SeqNum]ReceivedPkts[SeqNum]=1

wait

otherwise

GBN sender extended FSM

Wait start_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

GBN receiver extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

GBN in Action

sender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK=0Rec 1 give to app and Send ACK=1Rec 2 give to app and Send ACK=2

Rec 3 give to app and Send ACK=3

Rec 4 give to app and Send ACK=4

Rec 5 give to app and Send ACK=5

Rec 7 discard and Send ACK=5

Rec 8 discard and Send ACK=5

Rec 9 discard and Send ACK=5

Rec 10 discard and Send ACK=5

Rec 11 discard and Send ACK=5

Rec 12 discard and Send ACK=5

Rec 13 discard and Send ACK=5

Rec 6 give to app and Send ACK=6

Rec 7 give to app and Send ACK=7Rec 8 give to app and Send ACK=8

Rec 9 give to app and Send ACK=9

Optimal size of N in GBN

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order

delivery to upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart

timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed

pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]

send ACK(n) out-of-order buffer in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

Selective repeat in action

Summary of transport layer tools used so far

ACK and NACK Sequence numbers (and no NACK) Time out Sliding window

Optimal size = bandwidth delay product (if no other flows are using the network)

Cumulative ACK Buffer at the receiver is optional

Selective ACK Requires buffering at the receiver

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size

connection-oriented handshaking (exchange

of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver

reliable in-order byte steam

Pipelined and time-varying window size TCP congestion and

flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

It can be used as a pointer for placing the received data in the receiver buffer

ACKs seq of next byte

expected from other side

cumulative ACK

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

GBN Receiver Activity Diagramstart

Set NextPktToRec = 0Clear ReceiverBufferClear ReceivedPktsReceiverBase = 0

ReceivedPkts[NextPktToRec] == 1

NextPktToRec++Give ReceiverBuffer[NextPktToRec] to app

Send ACK with ACKNum = NextPktToRec - 1

Place Pkt in ReceiverBuffer[SeqNum]ReceivedPkts[SeqNum]=1

wait

otherwise

GBN sender extended FSM

Wait start_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

GBN receiver extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

GBN in Action

sender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK=0Rec 1 give to app and Send ACK=1Rec 2 give to app and Send ACK=2

Rec 3 give to app and Send ACK=3

Rec 4 give to app and Send ACK=4

Rec 5 give to app and Send ACK=5

Rec 7 discard and Send ACK=5

Rec 8 discard and Send ACK=5

Rec 9 discard and Send ACK=5

Rec 10 discard and Send ACK=5

Rec 11 discard and Send ACK=5

Rec 12 discard and Send ACK=5

Rec 13 discard and Send ACK=5

Rec 6 give to app and Send ACK=6

Rec 7 give to app and Send ACK=7Rec 8 give to app and Send ACK=8

Rec 9 give to app and Send ACK=9

Optimal size of N in GBN

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order

delivery to upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart

timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed

pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]

send ACK(n) out-of-order buffer in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

Selective repeat in action

Summary of transport layer tools used so far

ACK and NACK Sequence numbers (and no NACK) Time out Sliding window

Optimal size = bandwidth delay product (if no other flows are using the network)

Cumulative ACK Buffer at the receiver is optional

Selective ACK Requires buffering at the receiver

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size

connection-oriented handshaking (exchange

of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver

reliable in-order byte steam

Pipelined and time-varying window size TCP congestion and

flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

It can be used as a pointer for placing the received data in the receiver buffer

ACKs seq of next byte

expected from other side

cumulative ACK

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

GBN sender extended FSM

Wait start_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

GBN receiver extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

GBN in Action

sender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK=0Rec 1 give to app and Send ACK=1Rec 2 give to app and Send ACK=2

Rec 3 give to app and Send ACK=3

Rec 4 give to app and Send ACK=4

Rec 5 give to app and Send ACK=5

Rec 7 discard and Send ACK=5

Rec 8 discard and Send ACK=5

Rec 9 discard and Send ACK=5

Rec 10 discard and Send ACK=5

Rec 11 discard and Send ACK=5

Rec 12 discard and Send ACK=5

Rec 13 discard and Send ACK=5

Rec 6 give to app and Send ACK=6

Rec 7 give to app and Send ACK=7Rec 8 give to app and Send ACK=8

Rec 9 give to app and Send ACK=9

Optimal size of N in GBN

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order

delivery to upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart

timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed

pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]

send ACK(n) out-of-order buffer in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

Selective repeat in action

Summary of transport layer tools used so far

ACK and NACK Sequence numbers (and no NACK) Time out Sliding window

Optimal size = bandwidth delay product (if no other flows are using the network)

Cumulative ACK Buffer at the receiver is optional

Selective ACK Requires buffering at the receiver

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size

connection-oriented handshaking (exchange

of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver

reliable in-order byte steam

Pipelined and time-varying window size TCP congestion and

flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

It can be used as a pointer for placing the received data in the receiver buffer

ACKs seq of next byte

expected from other side

cumulative ACK

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

GBN receiver extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

GBN in Action

sender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK=0Rec 1 give to app and Send ACK=1Rec 2 give to app and Send ACK=2

Rec 3 give to app and Send ACK=3

Rec 4 give to app and Send ACK=4

Rec 5 give to app and Send ACK=5

Rec 7 discard and Send ACK=5

Rec 8 discard and Send ACK=5

Rec 9 discard and Send ACK=5

Rec 10 discard and Send ACK=5

Rec 11 discard and Send ACK=5

Rec 12 discard and Send ACK=5

Rec 13 discard and Send ACK=5

Rec 6 give to app and Send ACK=6

Rec 7 give to app and Send ACK=7Rec 8 give to app and Send ACK=8

Rec 9 give to app and Send ACK=9

Optimal size of N in GBN

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order

delivery to upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart

timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed

pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]

send ACK(n) out-of-order buffer in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

Selective repeat in action

Summary of transport layer tools used so far

ACK and NACK Sequence numbers (and no NACK) Time out Sliding window

Optimal size = bandwidth delay product (if no other flows are using the network)

Cumulative ACK Buffer at the receiver is optional

Selective ACK Requires buffering at the receiver

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size

connection-oriented handshaking (exchange

of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver

reliable in-order byte steam

Pipelined and time-varying window size TCP congestion and

flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

It can be used as a pointer for placing the received data in the receiver buffer

ACKs seq of next byte

expected from other side

cumulative ACK

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

GBN in Action

sender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK=0Rec 1 give to app and Send ACK=1Rec 2 give to app and Send ACK=2

Rec 3 give to app and Send ACK=3

Rec 4 give to app and Send ACK=4

Rec 5 give to app and Send ACK=5

Rec 7 discard and Send ACK=5

Rec 8 discard and Send ACK=5

Rec 9 discard and Send ACK=5

Rec 10 discard and Send ACK=5

Rec 11 discard and Send ACK=5

Rec 12 discard and Send ACK=5

Rec 13 discard and Send ACK=5

Rec 6 give to app and Send ACK=6

Rec 7 give to app and Send ACK=7Rec 8 give to app and Send ACK=8

Rec 9 give to app and Send ACK=9

Optimal size of N in GBN

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order

delivery to upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart

timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed

pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]

send ACK(n) out-of-order buffer in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

Selective repeat in action

Summary of transport layer tools used so far

ACK and NACK Sequence numbers (and no NACK) Time out Sliding window

Optimal size = bandwidth delay product (if no other flows are using the network)

Cumulative ACK Buffer at the receiver is optional

Selective ACK Requires buffering at the receiver

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size

connection-oriented handshaking (exchange

of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver

reliable in-order byte steam

Pipelined and time-varying window size TCP congestion and

flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

It can be used as a pointer for placing the received data in the receiver buffer

ACKs seq of next byte

expected from other side

cumulative ACK

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Optimal size of N in GBN

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order

delivery to upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart

timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed

pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]

send ACK(n) out-of-order buffer in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

Selective repeat in action

Summary of transport layer tools used so far

ACK and NACK Sequence numbers (and no NACK) Time out Sliding window

Optimal size = bandwidth delay product (if no other flows are using the network)

Cumulative ACK Buffer at the receiver is optional

Selective ACK Requires buffering at the receiver

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size

connection-oriented handshaking (exchange

of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver

reliable in-order byte steam

Pipelined and time-varying window size TCP congestion and

flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

It can be used as a pointer for placing the received data in the receiver buffer

ACKs seq of next byte

expected from other side

cumulative ACK

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Selective Repeat

receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order

delivery to upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq rsquos again limits seq s of sent unACKed pkts

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart

timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed

pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]

send ACK(n) out-of-order buffer in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

Selective repeat in action

Summary of transport layer tools used so far

ACK and NACK Sequence numbers (and no NACK) Time out Sliding window

Optimal size = bandwidth delay product (if no other flows are using the network)

Cumulative ACK Buffer at the receiver is optional

Selective ACK Requires buffering at the receiver

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size

connection-oriented handshaking (exchange

of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver

reliable in-order byte steam

Pipelined and time-varying window size TCP congestion and

flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

It can be used as a pointer for placing the received data in the receiver buffer

ACKs seq of next byte

expected from other side

cumulative ACK

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Selective repeat sender receiver windows

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart

timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed

pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]

send ACK(n) out-of-order buffer in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

Selective repeat in action

Summary of transport layer tools used so far

ACK and NACK Sequence numbers (and no NACK) Time out Sliding window

Optimal size = bandwidth delay product (if no other flows are using the network)

Cumulative ACK Buffer at the receiver is optional

Selective ACK Requires buffering at the receiver

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size

connection-oriented handshaking (exchange

of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver

reliable in-order byte steam

Pipelined and time-varying window size TCP congestion and

flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

It can be used as a pointer for placing the received data in the receiver buffer

ACKs seq of next byte

expected from other side

cumulative ACK

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Selective repeat

data from above if next available seq in

window send pkt

timeout(n) resend pkt n restart

timer

ACK(n) in [sendbasesendbase+N]

mark pkt n as received if n smallest unACKed

pkt advance window base to next unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]

send ACK(n) out-of-order buffer in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)

otherwise ignore

receiver

Selective repeat in action

Summary of transport layer tools used so far

ACK and NACK Sequence numbers (and no NACK) Time out Sliding window

Optimal size = bandwidth delay product (if no other flows are using the network)

Cumulative ACK Buffer at the receiver is optional

Selective ACK Requires buffering at the receiver

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size

connection-oriented handshaking (exchange

of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver

reliable in-order byte steam

Pipelined and time-varying window size TCP congestion and

flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

It can be used as a pointer for placing the received data in the receiver buffer

ACKs seq of next byte

expected from other side

cumulative ACK

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Selective repeat in action

Summary of transport layer tools used so far

ACK and NACK Sequence numbers (and no NACK) Time out Sliding window

Optimal size = bandwidth delay product (if no other flows are using the network)

Cumulative ACK Buffer at the receiver is optional

Selective ACK Requires buffering at the receiver

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size

connection-oriented handshaking (exchange

of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver

reliable in-order byte steam

Pipelined and time-varying window size TCP congestion and

flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

It can be used as a pointer for placing the received data in the receiver buffer

ACKs seq of next byte

expected from other side

cumulative ACK

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Summary of transport layer tools used so far

ACK and NACK Sequence numbers (and no NACK) Time out Sliding window

Optimal size = bandwidth delay product (if no other flows are using the network)

Cumulative ACK Buffer at the receiver is optional

Selective ACK Requires buffering at the receiver

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size

connection-oriented handshaking (exchange

of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver

reliable in-order byte steam

Pipelined and time-varying window size TCP congestion and

flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

It can be used as a pointer for placing the received data in the receiver buffer

ACKs seq of next byte

expected from other side

cumulative ACK

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size

connection-oriented handshaking (exchange

of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver

reliable in-order byte steam

Pipelined and time-varying window size TCP congestion and

flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

It can be used as a pointer for placing the received data in the receiver buffer

ACKs seq of next byte

expected from other side

cumulative ACK

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size

connection-oriented handshaking (exchange

of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver

reliable in-order byte steam

Pipelined and time-varying window size TCP congestion and

flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

It can be used as a pointer for placing the received data in the receiver buffer

ACKs seq of next byte

expected from other side

cumulative ACK

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

It can be used as a pointer for placing the received data in the receiver buffer

ACKs seq of next byte

expected from other side

cumulative ACK

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

It can be used as a pointer for placing the received data in the receiver buffer

ACKs seq of next byte

expected from other side

cumulative ACK

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo echoesback lsquoCrsquo

timesimple telnet scenario

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Seq no and ACKs

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

Seq no 101ACK no 12Data HELLength 3

Seq no 12ACK no

Data Length 0

Seq no 104ACK no 12Data LO WLength 4

Seq no 12ACK noData

Length 0

104

108

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Seq no and ACKs - bidirectional

110108

H E L L O W O R L D101102103104105106107 109 111

Byte numbers

G O O D B U Y

12 13 14 15 16 17 18

Seq no 101ACK no 12Data HELLength 3

Seq no ACK no

Data GOODLength 4

Seq no ACK no

Data LO WLength 4

Seq no ACK no Data BULength 2

12104

10416

10816

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

TCP Round Trip Time and TimeoutQ how to set TCP timeout

value (RTO) If RTO is too short

premature timeout unnecessary

retransmissions If RTO is too long

slow reaction to segment loss

Can RTT be used No RTT varies there is no

single RTT Why does RTT varying

bull Because statistical multiplexing results in queuing

How about using the average RTT

The average is too small since half of the RTTs are larger the average

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value = 0125

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

TCP Round Trip Time and TimeoutSetting the timeout (RTO) RTO = EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT

RTO = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

TCP Round Trip Time and Timeout

RTO = EstimatedRTT + 4DevRTT Might not always work

RTO = max(MinRTO EstimatedRTT + 4DevRTT)

MinRTO = 250 ms for Linux 500 ms for windows

1 sec for BSD

So in most cases RTO = minRTO

Actually when RTOgtMinRTO the performance is quite bad there are many spurious timeoutsNote that RTO was computed in an ad hoc way It is really a signal processing and queuing theory questionhellip

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

RTO details When a pkt is sent the

timer is started unless it is already running

When a new ACK is received the timer is restarted

Thus the timer is for the oldest unACKed pkt Q if RTO=RTT- are there

many spurious timeouts A Not necessarily

(actually yes)

RTO

ACK arrives and so RTO

timer is restarted

RTO

RTO

RTO

bull This shifting of the RTO means that even if RTOltRTT there might not be a timeout

bull However for the first packet sent the timer is started If RTOltRTT of this first packet then there will be a spurious timeout

bull While it is implementation dependent some implementations estimate RTT only once per RTT

bull The RTT of every pkt is not measured bull Instead if no RTT is being measured then the RTT of the next pkt is measured But the

RTT of retransmitted pkts is not measuredbull Some versions of TCP measure RTT more often

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Lost Detectionsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

TO

Send pkt12Send pkt13

Send pkt6Send pkt7Send pkt8Send pkt9

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 12 save in buffer and Send ACK no= 6

Rec 13 save in buffer and Send ACK no=6

Rec 6 give to app and Send ACK no =14

Rec 7 give to app and Send ACK no =14

Rec 8 give to app and Send ACK no =14

Rec 9 give to app and Send ACK no=14

bull It took a long time to detect the loss with RTObull But by examining the ACK no it is possible to

determine that pkt 6 was lostbull Specifically receiving two ACKs with ACK no=6

indicates that segment 6 was lostbull A more conservative approach is to wait for 4 of

the same ACK no (triple-duplicate ACKs) to decide that a packet was lost

bull This is called fast retransmitbull Triple dup-ACK is like a NACK

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Send pkt14

Fast Retransmitsender receiver

Send pkt0Send pkt2Send pkt3

Send pkt4Send pkt5Send pkt6Send pkt7

Send pkt8

Send pkt9

Send pkt10

Send pkt11

Send pkt6Send pkt12

Send pkt13

Send pkt15Send pkt16

Rec 0 give to app and Send ACK no= 1Rec 1 give to app and Send ACK no= 2

Rec 2 give to app and Send ACK no = 3

Rec 3 give to app and Send ACK no =4

Rec 4 give to app and Send ACK no = 5

Rec 5 give to app and Send ACK no = 6

Rec 7 save in buffer and Send ACK no = 6

Rec 8 save in buffer and Send ACK no = 6

Rec 9 save in buffer and Send ACK no = 6

Rec 10 save in buffer and Send ACK no = 6

Rec 11 save in buffer and Send ACK no = 6

Rec 6 save in buffer and Send ACK= 12

Rec 12 save in buffer and Send ACK=13

Rec 13 give to app and Send ACK=14

Rec 14 give to app and Send ACK=15

Rec 15 give to app and Send ACK=16

Rec 16 give to app and Send ACK=17

first dup-ACK

second dup-ACKthird dup-ACK

Retransmit pkt 6

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate

The sender never has more than a receiver windows worth of bytes unACKed

This way the receiver buffer will never overflow

app process may be slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Flow control ndash so the receive doesnrsquot get overwhelmed

The number of unacknowledged packets must be less than the receiver window

As the receivers buffer fills decreases the receiver window

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

Seq=1001Ack=24Data size =0Rwin=9

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31

Application reads buffer

24 25 26 27 28 29 30 31

e

The rBuffer is full

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

buffer

Seq=1001Ack=24Data size =0Rwin=9

Seq=1001Ack=24Data size =0Rwin=9

3 s

Seq=4Ack=1001Data = lsquoersquo size = 1 (bytes)

15Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

24 25 26 27 28 29 30 31Application reads buffer

24 25 26 27 28 29 30 31

e

Seq=4Ack=1001Data = size = 0 (bytes)

window probe

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Seq=20Ack=1001Data = lsquoHirsquo size = 2 (bytes)

Seq=1001Ack=24Data size =0Rwin=0

Seq=22Ack=1001Data = lsquoByrsquo size = 2 (bytes)

Seq=1001Ack=22Data size =0Rwin=2

15

buffer

Seq

SYN had seq=14

16 17 18 19 20 21 22

S t e v e H i

S t e v e H i B y

15 16 17 18 19 20 21 22

Seq=4Ack=1001Data = size = 0 (bytes)

3 s

Seq=1001Ack=24Data size =0Rwin=0

6 s

Seq=4Ack=1001Data = size = 0 (bytes)

Max time between probes is 60 or 64 seconds

The buffer is still full

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Receiver window The receiver window field is 16 bits Default receiver window

By default the receiver window is in units of bytes Hence 64KB is max receiver size for any (default)

implementation Is that enough

bull Recall that the optimal window size is the bandwidth delay productbull Suppose the bit-rate is 100Mbps = 125MBpsbull 2^16 125M = 0005 = 5msecbull If RTT is greater than 5 msec then the receiver window will force

the window to be less than optimalbull Windows 2K had a default window size of 12KB

Receiver window scale During SYN one option is Receiver window scale This option provides the amount to shift the Receiver window Eg Is rec win scale = 4 and rec win=10 then real receiver

window is 10ltlt4 = 160 bytes

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) Establish options and

versions of TCP

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment

server allocates buffers specifies server initial

seq Step 3 client receives SYNACK

replies with ACK segment which may contain data

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

Internetchecksum

(as in UDP)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Connection establishment

Seq no=2197Ack no = xxxxSYN=1ACK=0

Send SYNReset the sequence number

The ACK no is invalid

Seq no = 12ACK no = 2198SYN=1ACK=1

Send SYN-ACK Although no new data has arrived the ACK no is incremented (2197 +

1)

Seq no = 2198ACK no = 13SYN = 0ACK =1

Send ACK (for syn)

Although no new data has arrived the ACK no is

incremented (2197 + 1)

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Connection with losses

SYN

3 sec

SYN

2x3=6 sec

SYN

12 sec

SYN

64 sec

Give up

Total waiting time3+6+12+24+48+64 = 157sec

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

SYN Attackattacker

SYN Reserve memory for TCP connectionMust reserve enough for the receiver buffer

And that must be large enough to support high data rateignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

SYN Attackattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

157sec

bullTotal memory usage bullMemory per connection x number of SYNs sent in 157 sec

bullNumber of syns sent in 157 sec bull157 x 10Mbps (SYN size x 8) = 157 x 31250 = 5M

bullSuppose Memory per connection = 20KbullTotal memory = 20K x 5M = 100GB hellip machine will crash

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Defense from SYN Attack

bullIf too many SYNs come from the same host ignore themattacker

SYN

ignored SYN-ACK

SYN

SYN

SYN

SYN

SYN

SYN

SYN

ignore

ignore

ignore

ignore

ignore

bullBetter attackbullChange the source address of the SYN to some random address

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

SYN Cookie

Do not allocate memory when the SYN arrives but when the ACK for the SYN-ACK arrives

The attacker could send fake ACKs But the ACK must contain the correct ACK

number Thus the SYN-ACK must contain a sequence

number that is not predictable and does not require saving any information

This is what the SYN cookie method does

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

TCP Connection Management (cont)

Closing a connection

Step 1 client end system sends TCP packet with FIN=1 to the server

Step 2 server receives FIN replies with ACK with ACK no incremented Closes connection

The server close its side of the conenction whenever it wants (by send a pkt with FIN=1)

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

On the other hand the host should send as fast as possible (to speed up the file transfer)

a top-10 problem Low quality solution in wired networks Big problems in wireless (especially cellular)

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Ain original data

Host B

out

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A in original data

Host B

out

in original data plus retransmitted data

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Causescosts of congestion scenario 3

four senders multihop paths timeoutretransmit

Q what happens as in increases The total data rate is the sending

rate + the retransmission rate

finite shared output link

buffers

Host A

in original data

Host B

o

utrsquo retransmitted data

A

B

C

D

1 Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts

2 This will cause congestion at router B and force host C to increase its sending rate3 And so on

Host C

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream transmission capacity

used for that packet was wasted

Host A

Host B

o

u

t

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Approaches towards congestion control

End-end congestion control

no explicit feedback from network

congestion inferred from end-system observed loss delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to end systems single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

explicit rate sender should send at (XCP)

Two broad approaches towards congestion control

Today the network does not provide help to TCP But this will likely change with wireless data networking

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Chapter 3 outline

31 Transport-layer services

32 Multiplexing and demultiplexing

33 Connectionless transport UDP

34 Principles of reliable data transfer

35 Connection-oriented transport TCP segment structure reliable data transfer flow control connection

management

36 Principles of congestion control

37 TCP congestion control

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

TCP congestion control additive increase multiplicative decrease (AIMD)

In go-back-N the maximum number of unACKed pkts was N In TCP cwnd is the maximum number of unACKed bytes TCP varies the value of cwnd Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs

additive increase increase cwnd by 1 MSS every RTT until loss detectedbull MSS = maximum segment size and may be negotiated during connection establishment Otherwise it is set to 576B

multiplicative decrease cut cwnd in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

time

cwnd

Saw toothbehavior probing

for bandwidth

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

AIMD (approximate description)

When an ACK arrives cwnd = cwnd + MSSfloor(cwndMSS) After cwndMSS acks cwnd=cwnd+1

When a drop is detected via triple duplicate ACK cwnd = MSS floor((cwndMSS)2) cwnd ~ cwnd2

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Congestion Avoidance (AIMD) (approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)

When a drop is detected via triple-dup ACK cwnd = cwnd2cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

4000 1000

4000 2000

SN 3000AN 30Length 1000

4000 3000

SN 4000AN 30Length 1000

4000 4000

SN 30AN 2000RWin 10000

4250 3000SN 5000AN 30Length 1000

4250 4000

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 30004500 4000

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 30004750 4000

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 30005000 4000

5000 5000 SN 9000AN 30Length 1000

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

AIMD (approximately)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

8000 1000

8000 8000

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=5MSS

AN=5MSS

AN=5MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000

8250 80008375 8000

AN=5MSS

AN=5MSS

AN=5MSS4000 5000

4500 4000

4750 4000

SN 4MSS L=1MSS

SN 13MSS L=1MSS

SN 14MSS L=1MSS

AN=13MSS

4000 2000SN 17MSS L=1MSS

SN 18MSS L=1MSS

Drop detectedRetransmit lost packetCwnd=cwnd2

AN=5MSS

SN 12MSS L=1MSS

AN=5MSS

4 pkts are already unACKed so donrsquot send anymore4250 4000

SN 15MSS L=1MSS5000 40005000 5000 SN 16MSS L=1MSS

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Fast recovery (actual not approximate) Upon the two DUP ACK arrival do nothing Donrsquot send

any packets (InFlight is the same) Upon the third Dup ACK

set SSThres=cwnd2 Cwnd=cwnd2+3 Retransmit the requested packet

Upon every DUP ACK cwnd=cwnd+1 If InFlightltcwnd send a packet and increment InFlight When a new ACK arrives set cwnd=ssthres (RENO) When an ACK arrives that ACKs all packets that were

outstanding when the first drop was detected cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Congestion Avoidance (AIMD)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd4000 SN 1000

AN 30Length 1000

SN 2000AN 30Length 1000

inflight0

ssthresh0

4000 1000 0

4000 2000 0

SN 3000AN 30Length 1000

4000 3000 0

SN 4000AN 30Length 1000

4000 4000 0

SN 30AN 2000RWin 10000

4250 3000 0SN 5000AN 30Length 1000

4250 4000 0

SN 30AN 3000RWin 9000

SN 6000AN 30Length 1000

4500 3000 04500 4000 0

SN 30AN 4000Rwin 8000

SN 7000AN 30Length 1000

4750 3000 04750 4000 0

SN 30AN 2000RWin 7000

SN 8000AN 30Length 1000

5000 3000 05000 4000 0

5000 5000 0 SN 9000AN 30Length 1000

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Congestion Avoidance (AIMD) (actual not approximate)When an ACK arrives cwnd = cwnd + 1 floor(cwnd)When a drop is detected via triple-dup ACK cwnd = cwnd2

cwnd8000

inflight0

ssthresh0

8000 1000 0

8000 8000 0

SN 1MSS L=1MSS

AN=2MSS

SN 2MSS L=1MSS

SN 3MSS L=1MSS

SN 4MSS L=1MSS

SN 5MSS L=1MSS

SN 6MSS L=1MSS

SN 7MSS L=1MSS

SN 8MSS L=1MSS

AN=3MSS

AN=4MSS

AN=4MSS

AN=4MSS

AN=4MSS

SN 9MSS L=1MSS

SN 10MSS L=1MSS

SN 11MSS L=1MSS

3rd dup-ACK

8125 8000 0

8250 8000 08375 8000 0

7000 8000 4000

AN=4MSS

AN=4MSS

AN=4MSS

8000 8000 4000

9000 9000 4000

10000 10000 4000

SN 4MSS L=1MSS

SN 12MSS L=1MSS

SN 13MSS L=1MSS

AN=12MSS

4000 2000 0SN 14MSS L=1MSS

SN 15MSS L=1MSS

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

TCP Performance bull Q1 What is the rate that packets are sent

bullHow many pkts are send in a RTTbullRate = cwnd RTT

cwnd4

5

6

Seq (MSS)

1234

56789

101112131415

2345

5678910

1112131415

42545

475

52545658

bull Q2 at what rate does cwnd increase bullHow often does cwnd increase by 1bullEach RTT cwnd increases by 1

bull dRatedt = 1RTT

RTT

RTT

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

TCP Start Up

What should the initial value of cwnd be Option one large it should be a rough

guess of the steady state value of cwndbull But this might cause too much congestion

Option two do it more slowly = slow start Slow Start

Initially cwnd = cwnd0 (typical 1 2 or 3) When an non-dup ack arrives

bull cwnd = cwnd + 1 When a pkt loss is detected exit slow start

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Slow start

SYN Seq=20 Ack=X

SYN Seq=1000 Ack=21

SYN Seq=21 Ack=1001

Seq=21 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

Seq=1001 Ack=1021 size =0

Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000 Seq=1001 Ack=1021 size =0Seq=1021 Ack=1001 Data=lsquohelliprsquo size =1000Seq=2021 Ack=1001 Data=lsquohelliprsquo size =1000

cwnd

1

2

3

4

5678

Triple dup ack4

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Slow start Congestion avoidance

dropsdrop

How quickly does cwnd increase during slow startHow much does it increase in 1 RTTIt roughly doubles each RTT ndash it grows exponentiallydcnwddt = 2 cwnd

After a drop in slow start TCP switches to AIMD (congestion avoidance)

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Slow start

The exponential growth of cwnd during slow start can get a bit of control

To tame things Initially

cwnd = 1 2 or 3 SSThresh = SSThresh0 (eg 44MSS)

When an new ACK arrives cwnd = cwnd + 1 if cwnd gt= SSThresh go to congestion avoidance If a triple dup ACK occures cwnd=cwnd2 and go to

congestion avoidance

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

TCP Behavior

Slow start Congestion avoidance

dropsCwnd=ssthresh

Slow start Congestion avoidance

dropsdrop

cwnd

cwnd

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Time out

Detecting losses with time out is considered to be an indication of severe

When time out occurs Ssthresh = cwnd2 cwnd = 1 RTO = 2xRTO Enter slow start

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Time Outcwnd

8

SSThresh

X

RTO

1 4

2 4

3 4

4 4

425 X

45 X475 X

5 X

Cwnd = ssthresh =gt exit slow start and enter

congestion avoidance

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Time out

RTO

2xRTO

min(4xRTO 64 sec)

Give up if no ACK for ~120 sec

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Rough view of TCP congestion control

Slow start Congestion avoidance

dropsCwnd=ssthres

Slow start Congestion avoidance

dropsdrop

Slow start Congestion avoidance

dropsdrop

Slow start

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

TCP Tahoe (old version of TCP)

Slow start Congestion avoidance

dropsdrop

Slow start

Enter slow start after every loss

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Summary of TCP congestion control

Theme probe the system Slowly increase cwnd until there is a packet drop That must

imply that the cwnd size (or sum of windows sizes) is larger than the BWDP

Once a packet is dropped then decrease the cwnd And then continue to slowly increase

Two phases slow start (to get to the ballpark of the correct cwnd) Congestion avoidance to oscillate around the correct cwnd

size

Connectionestablishment

Slow-startCongestion avoidance

CwndgtssthressTriple dup ack

timeout

Connectiontermination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Congestion avoidance state chart

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

TCP Performance 1 ACK Clocking

What is the maximum data rate that TCP can send data

10Mbps1Gbps 1Gbpssource destination

Rate that pkts are sent = 1 Gbpspkt size = 1 pkt each 12 usec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 10 Mbpspkt size = 1 pkt each 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that ACKs are sent ACK 1 pkts = 10 Mbpspkt size= 1 ACK every 12 msec

Rate that pkts are sent = 1 pkt for each ACK = 1 pkt every 12 msec

The sending rate is the correct date rate No congestion should occur

This is due to ACK clocking pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

TCP throughput

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

TCP throughput

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

TCP throughput

w

w2

Mean value= (w+w2)2

= w34

Throughput = wRTT = w34RTT

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

TCP ThroughputHow many packets sent during one cycle (ie one tooth of the saw-tooth)

The ldquotoothrdquo starts at w2 increments by one up to ww2 + (w2+1) + (w2+2) + hellip + (w2+w2)

w2 +1 terms

= w2 (w2+1) + (0+1+2+hellipw2)= w2 (w2+1) + (w2(w2+1))2= (w2)^2 + w2 + 12(w2)^2 + 12w2= 32(w2)^2 + 32(w2)~ 38 w^2

So one out of 38 w^2 packets is droppedThis gives a loss probability of p = 1(38 w^2)Or w = sqrt(83) sqrt(p)

Combining with the first eq

Throughput = w34RTT = sqrt(83)34 (RTT sqrt(p))= sqrt(32) (RTT sqrt(p))

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

RTT unfairness

Throughput = sqrt(32) (RTT sqrt(p)) A shorter RTT will get a higher throughput even if the

loss probability is the same

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Two connections share the same bottleneck so they share the same critical resourcesA yet the one with a shorter RTT receives higher throughput and thus receives a higher fraction of the critical resources

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

TCP problems TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

p = 210-10

Random loss from bit-errors on fiber links may have a higher loss probability

New versions of TCP for high-speed

pRTT

MSS221

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

TCP over wireless

In the simple case wireless links have random losses

These random losses will result in a low throughput even if there is little congestion

However link layer retransmissions can dramatically reduce the loss probability

Nonetheless there are several problems Wireless connections might occasionally break

bull TCP behaves poorly in this case The throughput of a wireless link may quickly vary

bull TCP is not able to react quick enough to changes in the conditions of the wireless channel

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary

Chapter 3 Summary principles behind transport

layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

Next leaving the network

ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

  • Transport Layer
  • Transport Layer ndash Topics
  • Transport layer
  • Connection oriented connectionless
  • TCP vs UCP
  • Applications and Transport Protocols
  • Multiplexing with ports
  • Chapter 3 outline
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Slide 13
  • Principles of Reliable data transfer
  • Slide 15
  • Slide 16
  • Reliable data transfer getting started
  • Slide 18
  • Rdt10 reliable transfer over a reliable channel
  • Slide 20
  • Rdt20 channel with bit errors
  • rdt20 FSM specification
  • Slide 23
  • rdt20 has a fatal flaw
  • rdt21 sender handles garbled ACKNAKs
  • rdt21 receiver handles garbled ACKNAKs
  • Slide 27
  • Slide 28
  • rdt21 discussion
  • rdt22 a NAK-free protocol
  • rdt22 sender receiver fragments
  • Slide 32
  • rdt30 channels with errors and loss
  • rdt30 sender
  • Slide 35
  • rdt30 in action
  • Slide 37
  • Performance of rdt30
  • rdt30 stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocols
  • Selective repeat big picture
  • Go-Back-N
  • Slide 45
  • Slide 46
  • GBN sender extended Activity Diagram
  • GBN Receiver Activity Diagram
  • Slide 49
  • Slide 50
  • GBN sender extended FSM
  • GBN receiver extended FSM
  • GBN in Action
  • Optimal size of N in GBN
  • Selective Repeat
  • Selective repeat sender receiver windows
  • Selective repeat
  • Selective repeat in action
  • Summary of transport layer tools used so far
  • Slide 60
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP seq rsquos and ACKs
  • Seq no and ACKs
  • Seq no and ACKs - bidirectional
  • TCP Round Trip Time and Timeout
  • Slide 67
  • Example RTT estimation
  • Slide 69
  • Slide 70
  • RTO details
  • Lost Detection
  • Fast Retransmit
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Slide 75
  • Slide 76
  • TCP Flow Control
  • Flow control ndash so the receive doesnrsquot get overwhelmed
  • Slide 79
  • Slide 80
  • Receiver window
  • Slide 82
  • TCP Connection Management
  • Slide 84
  • Connection establishment
  • Connection with losses
  • SYN Attack
  • Slide 88
  • Defense from SYN Attack
  • SYN Cookie
  • TCP Connection Management (cont)
  • Slide 92
  • TCP Connection Management (cont)
  • Slide 94
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 3
  • Slide 99
  • Approaches towards congestion control
  • Slide 101
  • TCP congestion control additive increase multiplicative decrease (AIMD)
  • AIMD (approximate description)
  • Congestion Avoidance (AIMD) (approximate)
  • AIMD (approximately)
  • Fast recovery (actual not approximate)
  • Congestion Avoidance (AIMD)
  • Congestion Avoidance (AIMD) (actual not approximate)
  • TCP Performance
  • TCP Start Up
  • Slow start
  • Slide 112
  • Slide 113
  • TCP Behavior
  • Time out
  • Time Out
  • Time out
  • Rough view of TCP congestion control
  • TCP Tahoe (old version of TCP)
  • Summary of TCP congestion control
  • Slow start state chart
  • Congestion avoidance state chart
  • TCP sender congestion control
  • TCP Performance 1 ACK Clocking
  • TCP throughput
  • Slide 126
  • Slide 127
  • TCP Throughput
  • TCP Fairness
  • Why is TCP fair
  • RTT unfairness
  • Fairness (more)
  • TCP problems TCP over ldquolong fat pipesrdquo
  • TCP over wireless
  • Chapter 3 Summary