1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind...

63
1 Chapter 7 Internet Transport Protocols

Transcript of 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind...

Page 1: 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind transport layer services: m Multiplexing / demultiplexing.

1

Chapter 7Internet Transport

Protocols

Page 2: 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind transport layer services: m Multiplexing / demultiplexing.

Transport Layer

Transport LayerOur goals: understand

principles behind transport layer services: Multiplexing /

demultiplexing data streams of several applications

reliable data transfer flow control congestion control

Chapter 6: rdt principlesChapter 7: multiplex/ demultiplex Internet transport layer

protocols: UDP: connectionless

transport TCP: connection-oriented

transport• connection setup• data transfer• flow control• congestion control

22

Page 3: 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind transport layer services: m Multiplexing / demultiplexing.

Transport vs. network layer

Transport layer uses Network layer services adds value to these services

Transport Layer Network Layer

logical communication between processes

logical communication between hosts

exists only in hostsexists in hosts and

in routers

ignores network routes data through network

Port #s used for “routing” to the intended process

inside destination computer

IP addresses used for routing in network

3

Page 4: 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind transport layer services: m Multiplexing / demultiplexing.

4

Socket Multiplexing

Page 5: 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind transport layer services: m Multiplexing / demultiplexing.

Multiplexing/demultiplexing

application

transport

network

link

physical

P1 application

transport

network

link

physical

application

transport

network

link

physical

P2P3 P4P1

host 1 host 2 host 3

= process= socket

receive segment from L3deliver each received segment to the right socket

Demultiplexing at rcv host:gather data from multiplesockets, envelop data with headers (later used for demultiplexing), pass to L3

Multiplexing at send host:

5

Page 6: 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind transport layer services: m Multiplexing / demultiplexing.

How demultiplexing works host receives IP datagrams

each datagram has source IP address, destination IP address in its header• used by network to get it there

each datagram carries one transport-layer segment

each segment has source, destination port number in its header

host uses port #s(*) to direct segment to correct socket from socket data gets to the

relevant application process

source port # dest port #

32 bits

applicationdata

(message)

other header fields

TCP/UDP segment format

L4

head

er

ap

pl. m

sgL3

h

dr

other IP header fields

source IP addr dest IP addr.

6

(*) to find a TCP socket on server, source & dest. IP address is also needed, see details later

Page 7: 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind transport layer services: m Multiplexing / demultiplexing.

Connectionless demultiplexing (UDP)

Processes create sockets with port numbers

a UDP socket is identified by a pair of numbers:(my IP address , my port number)

Client decides to contact: a server (peer IP-address) an application ( “WKP”)

puts those into the UDP packet sent, written as: dest IP address - in the

IP header of the packet dest port number - in its

UDP header

When server receives a UDP segment: checks destination port

number in segment directs UDP segment to

the socket with that port number

• single server socket per application type

• (packets from different remote sockets directed to same socket)

msg waits in socket queue and processed in its turn.

answer sent to the client socket (listed in Source fields of query packet)

7

Realtime UDP applications have individual server sockets per client. However their port numbers are distinct, since they are coordinated in advance by some signaling protocol. This is possible since port number is not used to specify the application.

Page 8: 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind transport layer services: m Multiplexing / demultiplexing.

ClientIP:B

SP: 53

DP: 5775

S-IP: C

D-IP: B

SP = Source port numberDP= Destination port numberS-IP= Source IP AddressD-IP=Destination IP Address

Connectionless demux (cont)

client IP: A

P1

serverIP: C

SP and S-IP provide “return address”

P2

client socket:port=9157, IP=A

P3

server socket:port=53, IP = C

IP-HeaderUDP-Header

SP: 53

DP: 9157

S-IP: C

D-IP: A

SP: 5775

DP: 53

S-IP: B

D-IP: C

message

client socket:port=5775, IP=B

message message

Wait for application

Getting Service

Reply

Getting Service

Reply

L1

L2

L4

L5

L3

8

message

SP: 9157

DP: 53

S-IP: A

D-IP: C

Page 9: 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind transport layer services: m Multiplexing / demultiplexing.

Connection-oriented demux (TCP)

TCP socket identified by 4-tuple: local (my) IP address local (my) port number remote (peer) IP

address remote (peer) port #

host receiving a packet uses all four values to direct the segment to appropriate socket

Server host may support many simultaneous TCP sockets: each socket identified

by its own 4-tuple

Web server dedicates a different socket to each connecting client If you open two browser

windows, you generate 2 sockets at each end

9

Page 10: 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind transport layer services: m Multiplexing / demultiplexing.

Connection-oriented demux (cont)

ClientIP: B

client IP: A

P1

serverIP: C

P3

client socket:

LP= 9157, L-IP= ARP= 80 , R-IP= C

P1

LP= Local Port , RP= Remote Port L-IP= Local IP , R-IP= Remote IP

P4

server socket:

LP= 80 , L-IP= CRP= 9157, R-IP= A

P6

server socket:

LP= 80 , L-IP= CRP= 5775, R-IP= B

“L”= Local = My“R”= Remote = Peer

P2

client socket:

LP= 5775, L-IP= BRP= 80 , R-IP= C

client socket:

LP= 9157, L-IP= BRP= 80 , R-IP= C

P5

server socket:LP= 80 , L-IP= CRP= 9157, R-IP= B

SP: 5775

DP: 80

D-IP: CS-IP: B

packet:

messageL1

L2

L4

L5

L3

10

H3

H4SP: 9157

DP: 80

S-IP: A

D-IP: C

packet:

messageSP: 9157

DP: 80

D-IP: CS-IP: B

message

packet:

Page 11: 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind transport layer services: m Multiplexing / demultiplexing.

Connection-oriented Sockets Client socket has a port

number unique in host packet for client socket

directed by the host OS based on dest. port only

each server application has an always active waiting socket;

that socket receives all packets not belonging to any established connection

these are packets that open new connections

when waiting socket accepts a ‘new connection’ segment,

a new socket is generated at server with same port number

this is the working socket for that connection

next sockets arriving at server on connection will be directed to working socket

socket will be identified using all 4 identifiers

last slide shows working sockets on the server side

11Note: Client IP + Client Port are globally

unique

Page 12: 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind transport layer services: m Multiplexing / demultiplexing.

12

UDP Protocol

Page 13: 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind transport layer services: m Multiplexing / demultiplexing.

UDP: User Datagram Protocol [RFC 768]

simple transport protocol “best effort” service, UDP

segments may be: lost delivered out of order

to applicationwith no correction by UDP

UDP will discard bad checksum segments if so configured by application

connectionless: no handshaking

between UDP sender, receiver

each UDP segment handled independently of others

Why is there a UDP? no connection

establishment saves delay

no congestion control: better delay & BW

simple: less memory & RT small segment header typical usage: realtime

appl. loss tolerant rate sensitive

other uses (why?): DNS SNMP

13

Page 14: 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind transport layer services: m Multiplexing / demultiplexing.

UDP segment structure

source port # dest port #

32 bits

application data (variable length)

length

Total length of segment (bytes)

checksum

14

Checksum computed over:• the whole segment, plus• part of IP header:

– both IP addresses– protocol field – total IP packet length

Checksum usage:• computed at destination to

detect errors• on error, discard segment, • checksum is optional

• if not used, sender puts checksum = all zeros

• computed zero = all ones

Page 15: 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind transport layer services: m Multiplexing / demultiplexing.

15

TCP Protocol

Page 16: 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind transport layer services: m Multiplexing / demultiplexing.

16

TCP: Overview RFCs: 793, 1122, 1323, 2018, 2581

full duplex data: bi-directional data flow

in same connection MSS: maximum

segment size

connection-oriented: handshaking (exchange

of control msgs) init’s sender, receiver state before data exchange

flow controlled: sender will not

overwhelm receiver

point-to-point: one sender, one receiver works between sockets

reliable, in-order byte stream: no “message boundaries”

pipelined: TCP congestion and flow

control set window size

send & receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

Page 17: 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind transport layer services: m Multiplexing / demultiplexing.

17

TCP segment structure

source port # dest port #

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberrcvr window size

ptr urgent datachecksum

FSRPAUheadlen

notused

Options (variable length)

URG: indicates startof urgent data

ACK: ACK # valid

PSH: indicates urgent data ends in this segm. ptr = end urgent data

# bytes rcvr willingto accept

countingby bytes of data(not segments!)

Internetchecksum

(as in UDP)

hdr length in 32 bit words

PSH, URG seldom usednot clearly defined

SYN: initialize conn., synchronize SN

FIN: I wish to disconn.

RST: break conn. immediately

FLAGS

Page 18: 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind transport layer services: m Multiplexing / demultiplexing.

TCP sequence # (SN) and ACK # (AN)

SN: byte stream

“number” of first byte in segment’s data

AN: SN of next byte

expected from other side

it’s a cumulative ACK

Qn: how receiver handles out-of-order segments? puts them in receive

buffer but does not acknowledge them

Host A Host B

time

SN=42, AN=79, 100 data bytes

SN=79, AN=142, 50 data bytes

SN=142, AN=129 , no data

host A sends100 data bytes

host ACKsreceipt of data , sends no dataWHY?

host B ACKs 100bytes and sends50 data bytes

simple data transfer scenario (some time after conn. setup)

18

Page 19: 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind transport layer services: m Multiplexing / demultiplexing.

19

Connection Setup: Objective Agree on initial sequence numbers

a sender should not reuse a seq# before it is sure that all packets with the seq# are purged from the network

• the network guarantees that a packet too old will be purged from the network: network bounds the life time of each packet

To avoid waiting for them to disappear, choose initial SN (ISN) far away from previous session

• needs connection setup so that the sender tells the receiver initial seq#

Agree on other initial parameters e.g. Maximum Segment Size

Page 20: 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind transport layer services: m Multiplexing / demultiplexing.

TCP Connection ManagementSetup: establish connection

between the hosts before exchanging data segments

called: 3 way handshake initialize TCP variables:

seq. #s buffers, flow control

info (e.g. RcvWindow) client : connection initiator

opens socket and cmds OS to connect it to server

server : contacted by client has waiting socket accepts connection generates working socket

Teardown: end of connection(we skip the details)

Three way handshake:Step 1: client host sends TCP

SYN segment to server specifies initial seq #

(ISN) no data

Step 2: server host receives SYN, replies with SYNACK segment (also no data) allocates buffers specifies server initial

SN & window sizeStep 3: client receives SYNACK,

replies with ACK segment, which may contain data

20

Page 21: 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind transport layer services: m Multiplexing / demultiplexing.

TCP Three-Way Handshake (TWH)

A

Send Buffer

Receive Buffer

Send Buffer

Receive Buffer

SYN , SN = X

SYNACK , SN = Y, AN = X+1

ACK , SN = X+1 , AN = Y+1

X+1

X+1

Y+1

Y+1

B

21

Page 22: 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind transport layer services: m Multiplexing / demultiplexing.

22

Connection Close

Objective of closure handshake: each side can release

resource and remove state about the connection

• Close the socket

client

FINI am done. Are you done too?

server

FIN : I am done too.

Goodbye!

initial close :

close

close

release resource?

release resource

release resource

no data fromclient

Page 23: 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind transport layer services: m Multiplexing / demultiplexing.

7-23

TCP reliable data transfer

TCP creates reliable service on top of IP’s unreliable service

pipelined segments cumulative acks single retransmission

timer receiver accepts out

of order segments but does not acknowledge them

Retransmissions are triggered by timeout events in some versions of

TCP also by triple duplicate ACKs (see later)

Initially consider simplified TCP sender: ignore flow control,

congestion control

Page 24: 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind transport layer services: m Multiplexing / demultiplexing.

7-24

TCP sender events:data rcvd from app: create segment with

seq # seq # is byte-stream

number of first data byte in segment

start timer if not already running (timer relates to oldest unACKed segment)

expiration interval: TimeOutInterval

timeout (*): retransmit segment

that caused timeout restart timer ACK rcvd: if ACK acknowledges

previously unACKed segments update what is known

to be ACKedNote: Ack is cumulative

start timer if there are outstanding segments

(*) retransmission done also on triple duplicate Ack (see later)

Page 25: 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind transport layer services: m Multiplexing / demultiplexing.

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) { switch(event) event: data received from application above if (NextSeqNum-send_base < N) then { create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data) } else reject data /* in truth: keep in send buffer until new Ack */ event: timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer event: ACK received, with ACK field value of y if (y > SendBase) { SendBase = y if (there are currently not-yet-acknowledged segments) start timer } } /* end of loop forever */

Comment:• SendBase-1: last cumulatively ACKed byteExample:• SendBase-1 = 71;y= 73, so the rcvrwants 73+ ;y > SendBase, sothat new data is ACKed

7-25Transport Layer

Page 26: 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind transport layer services: m Multiplexing / demultiplexing.

7-26

TCP actions on receiver events:

application takes data: free the room in

buffer give the freed cells

new numbers circular numbering

WIN increases by the number of bytes taken

data rcvd from IP: if Checksum fails, ignore

segment If checksum OK, then : if data came in order: update AN &WIN, as follows:

AN grows by the number of new in-order bytes

WIN decreases by same # if data out of order: Put in buffer, but

don’t count it for AN/ WIN

Page 27: 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind transport layer services: m Multiplexing / demultiplexing.

7-27

TCP: retransmission scenarios

stop timer

stop timer

starttimer for

SN 100

Host A

AN=100

timeA. normal scenario

Host B

AN=120

SN=100 , 20 bytes data

SN=92, 8 bytes data

starttimer

for SN 92

NO timer

starttimer for

new SN 92

AN=100

Host ASN=92, 8 bytes data

Xloss

B. lost ACK + retransmission

Host B

SN=92, 8 bytes data

AN=100

time

starttimer

for SN 92

TIMEOUT

NO timer

stop timer

timer setting

actual timer run

Page 28: 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind transport layer services: m Multiplexing / demultiplexing.

7-28Transport Layer 7-28

TCP retransmission scenarios (more)

AN=100

Host ASN=92, 8 bytes data

Xloss

C. lost ACK, NO retransmission

Host B

SN=100, 20 bytes data

AN=120

time

starttimer

for SN 92

stop timer

NO timer

Host A

timeD. premature timeout

Host BSN=92, 8 bytes data

AN=120

starttimer

for SN 92

TIMEOUT

NO timer

start for 92stop

start for 100

stop

SN=100, 20 bytes data

AN=100

AN=120

SN=92, 8 bytes data

redundant ACK

DROP !

Page 29: 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind transport layer services: m Multiplexing / demultiplexing.

Transport Layer 7-29

TCP ACK generation (Receiver rules)

Event at Receiver

Arrival of in-order segment withexpected seq #. All data up toexpected seq # already ACKed

Arrival of in-order segment withexpected seq #. One other segment has ACK pending

Arrival of out-of-order segmentwith higher-than-expect seq. # .Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK. Wait up to 500msfor next segment. If no data segment to send, then send ACK

Immediately send single cumulative ACK, ACKing both in-order segments

Immediately send duplicate ACK, indicating seq. # of next expected byteThis Ack carries no data & no new WIN

Immediately send ACK, provided thatsegment starts at lower end of 1st gap

[RFC 1122, RFC 2581]

Page 30: 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind transport layer services: m Multiplexing / demultiplexing.

Transport Layer 7-30

Fast Retransmit (Sender Rules) time-out period often

relatively long: Causes long delay before

resending lost packet

idea:detect lost segments via duplicate ACKs.

sender often sends many segments back-to-back

if segment is lost, there will likely be many duplicate ACKs for that segment

Rule: If sender receives 4 ACKs for same data (= 3 duplicates), it assumes that segment after ACKed data was lost: fast retransmit: resend

segment immediately (before timer expires)

Page 31: 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind transport layer services: m Multiplexing / demultiplexing.

Host A

tim

eout

Host B

time

X

resend seq X2

seq # x1seq # x2seq # x3seq # x4seq # x5

ACK # x2

ACK # x2ACK # x2ACK # x2

tripleduplicate

ACKs

Fast Retransmit scenario

* no data in segment* no window change

Transport Layer 7-31

Page 32: 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind transport layer services: m Multiplexing / demultiplexing.

Transport

Layer

7-32

event: ACK received, with ACK field value of y if (y > SendBase) { SendBase = y if (there are currently not-yet-acknowledged segments) start timer } else {if (segment carries no data & doesn’t change WIN) increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) { { resend segment with sequence number y

count of dup ACKs received for y = 0 } }

Fast retransmit algorithm:

a duplicate ACK for already ACKed segment

fast retransmit

Page 33: 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind transport layer services: m Multiplexing / demultiplexing.

33

TCP: setting timeouts

Page 34: 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind transport layer services: m Multiplexing / demultiplexing.

34

General idea

Q: how to set TCP timeout interval?

should be longer than RTT but: RTT will vary

if too short: premature timeout unnecessary

retransmissions if too long: slow

reaction to segment loss

Set timeout = average + safe margin :

Average

margin

Timeout Interval

Page 35: 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind transport layer services: m Multiplexing / demultiplexing.

35

Estimating Round Trip Time

EstimatedRTT = (1- )*EstimatedRTT + *SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially

fast typical value: = 0.125

SampleRTT: measured time from segment transmission until receipt of ACK for it SampleRTT will vary, want a “smoother” estimated RTT

use several recent measurements, not just current SampleRTT

RTT: gaia.cs.umass.edu to fantasia.eurecom.fr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

Page 36: 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind transport layer services: m Multiplexing / demultiplexing.

36

Setting TimeoutProblem: using the average of SampleRTT will generate

many timeouts due to network variations

Solution: EstimatedRTT plus “safety margin”

large variation in EstimatedRTT -> requires larger safety margin

Estimate average deviation of RTT:

TimeoutInterval = EstimatedRTT + 4*DevRTT

Then set timeout interval:

RTT

freq.

DevRTT = (1-)*DevRTT + *|SampleRTT-EstimatedRTT|(typically, = 0.25)

Page 37: 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind transport layer services: m Multiplexing / demultiplexing.

37

TCP: Flow Control

Page 38: 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind transport layer services: m Multiplexing / demultiplexing.

7-38

TCP Flow Control: Simple Case

TCP at A sends data to B The picture below shows the

TCP receive-buffer at B

flow control matches the send rate of A to the receiving application’s drain rate at B

Receive buffer size set by OS at connection init

WIN = window size = number bytes A may send starting at AN

application process at B may be slow at reading from buffer

sender won’t overflow

receiver’s buffer bytransmitting too

much, too fast

flow control

node B : Receive process

Receive Buffer

data taken by

application

TCP datain buffer

spare room

WIN

data from IP

(sent by TCP at A)

AN

Page 39: 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind transport layer services: m Multiplexing / demultiplexing.

7-39

TCP Flow control: General Case

Formulas: AN = first byte not received yet

sent to A in TCP header AckedRange =

= AN – FirstByteNotReadByAppl= = # bytes rcvd in sequence &not taken

WIN = RcvBuffer – AckedRange= “SpareRoom”

AN and WIN sent to A in TCP header Data received out of sequence is

considered part of ‘spare room’ range

Procedure: Rcvr advertises “spare

room” by including value of WIN in his segments

Sender A is allowed to send at most WIN bytes in the range starting with AN guarantees that receive

buffer doesn’t overflow

node B : Receive process

ACKed datain buffer

Rcv Buffer

data from IPdata taken by

application

WIN

(sent by TCP at A)s p a r e r o o m

non-ACKed data in buffer(arrived out of order)

ignored

AN

Page 40: 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind transport layer services: m Multiplexing / demultiplexing.

7-40

1 – דוגמה TCPבקרת זרימה של

Page 41: 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind transport layer services: m Multiplexing / demultiplexing.

7-41

2 – דוגמה TCPבקרת זרימה של

Page 42: 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind transport layer services: m Multiplexing / demultiplexing.

42

TCP: Congestion Control

Page 43: 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind transport layer services: m Multiplexing / demultiplexing.

43

TCP Congest’n Ctrl Overview (1)

Closed-loop, end-to-end, window-based congestion control

Designed by Van Jacobson in late 1980s, based on the AIMD algorithm of Dah-Ming Chu and Raj Jain

Works well so far: the bandwidth of the Internet has increased by more than 200,000 times

Many versions TCP-Tahoe: this is a less optimized version TCP-Reno: many OSs today implement Reno

type congestion control TCP-Vegas: not currently usedFor more details: see Stevens: TCP/IP illustrated; K-R chapter 6.7, or read:

http://lxr.linux.no/source/net/ipv4/tcp_input.c for linux implementation

Page 44: 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind transport layer services: m Multiplexing / demultiplexing.

44

Dynamic window size [Van Jacobson] Initialization: MI (Multiplicative Increase)

• Slow start Steady state: AIMD

(Additive Increase / Multiplicative Decrease)• Congestion Avoidance

“Congestion is timeout || 3 duplicate ACK” TCP Tahoe: treats both cases identically TCP Reno: treat each case differently

“Congestion = (also) higher latency” TCP Vegas

TCP Congest’n Ctrl Overview (2)

Page 45: 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind transport layer services: m Multiplexing / demultiplexing.

General method sender limits rate by limiting number

of unACKed bytes “in pipeline”:

cwnd: differs from WIN (how, why?) sender limited by ewnd ≡ min(cwnd,WIN)

(effecive window)

roughly,

cwnd is dynamic, function of perceived network congestion

rate = ewnd

RTT bytes/sec

LastByteSent-LastByteAcked cwnd (*)

cwndbytes

RTT

ACK(s)

Transport Layer 7-45

Page 46: 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind transport layer services: m Multiplexing / demultiplexing.

46

The Basic Two Phases

cwn

d

Slow start

Congestion avoidance

MSS

Multiplicative Increase

Additive Increase

Page 47: 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind transport layer services: m Multiplexing / demultiplexing.

Pure AIMD: Bandwidth Probing Principle

“probing for bandwidth”: increase transmission rate on receipt of ACK, until eventually loss occurs, then decrease transmission rate continue to increase on ACK, decrease on loss (since

available bandwidth is changing, depending on other connections in network) ACKs being received,

so increase rate slowly

X

X

XX

Xloss, so decrease rate fast

send

ing

rate

time

Q: how fast to increase/decrease? details to follow

TCP’s“sawtooth”behavior

Transport Layer 7-47

AI

MD

AIMD

this model ignores Slow Start

Page 48: 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind transport layer services: m Multiplexing / demultiplexing.

48

TCP Slowstart: MI

(*) doubled per RTT:• exponential increase in window size (very fast!)• therefore slowstart lasts a short time

initialize: cwnd = 1 MSSfor (each segment ACKed) cwnd += MSS (*)until (congestion event OR cwnd ≥ threshold)On congestion event:

{Threshold = cwnd/2cwnd = 1 MSS }

Slowstart algorithmHost A

one segment

RTT

Host B

time

two segments

four segments

* used in all TCP versions

Page 49: 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind transport layer services: m Multiplexing / demultiplexing.

TCP: congestion avoidance (CA) when cwnd > ssthresh

grow cwnd linearly:as long as all ACKs arrive increase cwnd

by ≈1 MSS per RTT approach possible

congestion slower than in slowstart

implementation: cwnd += MSS^2/cwnd for each ACK received

ACKs: increase cwnd by 1 MSS per RTT: additive increase

loss(*): cut cwnd in half : multiplicative decrease true in macro picture in actual algorithm

may have Slow Start first to grow up to this value (+)

AIMD

Transport Layer 7-49

(*) = Timeout or 3 Duplicate(+) depends on case & TCP type

Page 50: 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind transport layer services: m Multiplexing / demultiplexing.

50

TCP Tahoe

AI MDCA

SStSSt

CA

TCP TahoeT/O or 3 Dup

Initialize with SlowStartstate with cwnd = 1 MSS

When cwnd ≥ ssthresh change to CA state

When sense congestion(*): set ssthresh =ewnd/2 (+) set cwnd = 1 MSS change state to SlowStart

(*) Timeout or Triple Duplicate Ack

(+) recall ewnd = min(cwnd, WIN); in our discussion here we assume that WIN > cwnd, so ewnd=cwnd

Page 51: 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind transport layer services: m Multiplexing / demultiplexing.

TCP Reno

Rationale: triple duplicate event

shows less congestion than timeout first segment probably lost

but some others arrived

therefore on 3Dup, cwnd is decreased to ewnd/2,skipping SlowStart stage less aggressive than on

T/O this is an approximate

description; more details to the right and two slides below

TCP Reno Procedure Initialize with SlowStart Slowstart as in Tahoe CA growth as in Tahoe On T/O, act as in Tahoe On Triple Duplicate,

set ssthresh = ewnd/2 enter Fast Recovery

state this is a temporary state

until a non-Dup Ack arrives

when Fast Recovery ends, set: cwnd = ssthresh

Transport Layer 7-51

Page 52: 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind transport layer services: m Multiplexing / demultiplexing.

Fast RecoveryRationale: cwnd increases only when

a new segment is Ack’ed in the 3 Dup situation, it

may take time until such Ack arrives.

Until that time: we increase cwnd on the

arrival of each duplicate Ack, including the three that triggered Fast Retransmit

when new Ack arrives set cwnd = ssthresh

Fast Recovery State Initialize cwnd += 3

MSS on each additional

duplicate Ack increase cwnd by MSS

when a new segment is acknowledged, setcwnd = ssthresh

recall that ssthresh was set to half of the last ewnd value in CA state

Transport Layer 7-52

Page 53: 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind transport layer services: m Multiplexing / demultiplexing.

0

10

20

30

40

50

60

70

0 10 20 30 40 50 60

Time

Co

ng

esti

on

Win

do

w

threshold

congestionwindowtimeouts

slow start period

additive increase

fast retransmission

53

TCP Reno cwnd Trace

CACA

CA

Slo

w S

tart

Slo

w S

tart

Sl.S

tart

triple duplicate Ack

fast recovery stage skipped

Page 54: 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind transport layer services: m Multiplexing / demultiplexing.

TCP Reno Cong. Ctrl State Transition Diagram

slow start

congestionavoidance

fastrecovery

cwnd > ssthresh

loss:timeout

loss:timeout

new ACK loss:3dupACK

loss:3dupACK

loss:timeout

Transport Layer 7-54

Page 55: 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind transport layer services: m Multiplexing / demultiplexing.

cwnd > ssthresh

TCP Reno Congestion Control FSM

slow start

congestionavoidance

fastrecovery

timeoutssthresh = cwnd/2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd/2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s),as allowed

new ACK

new ACKcwnd = cwnd + MSS (MSS/cwnd)dupACKcount = 0transmit new segment(s),as allowed

.

dupACKcount++

duplicate ACK

dupACKcount == 3

ssthresh= cwnd/2cwnd = ssthresh + 3 MSSretransmit missing segment

dupACKcount++

duplicate ACK

dupACKcount == 3

ssthresh= cwnd/2cwnd = ssthresh + 3 MSS

retransmit missing segment

timeoutssthresh = cwnd/2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd + MSStransmit new segment(s), as allowed

duplicate ACK

cwnd = ssthreshdupACKcount = 0

New ACK

cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

INIT

check == 3?

check == 3?

Transport Layer 7-55

Page 56: 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind transport layer services: m Multiplexing / demultiplexing.

Popular “flavors” of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

ents

)

Transport Layer 7-56

Page 57: 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind transport layer services: m Multiplexing / demultiplexing.

Summary: TCP Reno Congestion Control when cwnd < ssthresh, sender in slow-start

phase, window grows exponentially.

when cwnd >= ssthresh, sender is in congestion-avoidance phase, window grows linearly.

when triple duplicate ACK occurs, ssthresh set to cwnd/2, cwnd eventually set to ~ ssthresh(after detour to Fast Retransmit state)

when timeout occurs, ssthresh set to cwnd/2, cwnd set to 1 MSS.

Transport Layer 7-57

Page 58: 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind transport layer services: m Multiplexing / demultiplexing.

Transport Layer 7-58

TCP throughput

Q: what’s average throughout of TCP as function of window size, RTT? ignoring slow start

let W be window size when loss occurs.when window is W, throughput is

W/RTT just after loss, window drops to W/2,

throughput to W/2RTT, then grows linearly slow

average throughout: .75 W/RTT

Page 59: 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind transport layer services: m Multiplexing / demultiplexing.

fairness goal: if K TCP sessions share same bottleneck link of bandwidth R, each should have average rate of R/K

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Transport Layer 7-59

Page 60: 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind transport layer services: m Multiplexing / demultiplexing.

Why is TCP fair?

Two competing sessions: (Tahoe, Slow Start ignored) Additive increase gives slope of 1, as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Connect

ion 2

th

roughput

congestion avoidance: additive increase

loss: decrease window by factor of 2

(a,b)

(a+t,b+t) => y = x+(b-a)

(a/2+t/2+t1,b/2+t/2+t1) ; y = x+(b-a)/2

y = x+(b-a)/4

Transport Layer 7-60

((a+t)/2,(b+t)/2) => y = x+(b-a)/2

y = x+(b-a)/4

Page 61: 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind transport layer services: m Multiplexing / demultiplexing.

Fairness (more)

Fairness and UDP multimedia apps

often do not use TCP do not want rate

throttled by congestion control

instead use UDP: pump audio/video at

constant rate, tolerate packet loss

Fairness and parallel TCP connections

nothing prevents appl. from opening parallel connections between two hosts.

web browsers do this example: link of rate R

supporting already9 connections; new app asks for 1 TCP,

gets rate R/10 new app asks for 11 TCPs,

gets > R/2 !!

Transport Layer 7-61

Page 62: 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind transport layer services: m Multiplexing / demultiplexing.

62

Extra Slides

Page 63: 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind transport layer services: m Multiplexing / demultiplexing.

Exercise MSS = 1000 Only one event per row

Transport Layer 7-63