Ch. 7 : Internet Transport Protocols

66
1 Ch. 7 : Internet Transport Protocols

description

Ch. 7 : Internet Transport Protocols. Our goals: understand principles behind transport layer services: Multiplexing / demultiplexing data streams of several applications reliable data transfer flow control congestion control. Chapter 6: rdt principles Chapter 7: multiplex/ demultiplex - PowerPoint PPT Presentation

Transcript of Ch. 7 : Internet Transport Protocols

Page 1: Ch. 7 : Internet  Transport Protocols

1

Ch. 7 : Internet Transport Protocols

Page 2: Ch. 7 : Internet  Transport Protocols

Transport Layer

Transport LayerOur goals: understand

principles behind transport layer services: Multiplexing /

demultiplexing data streams of several applications

reliable data transfer flow control congestion control

Chapter 6: rdt principlesChapter 7: multiplex/ demultiplex Internet transport layer

protocols: UDP: connectionless

transport TCP: connection-oriented

transport• connection setup• data transfer• flow control• congestion control

22

Page 3: Ch. 7 : Internet  Transport Protocols

Transport vs. network layer

Transport layer uses Network layer services adds more value to these services

Transport Layer Network Layer

logical communication between processes

logical communication between hosts

exists only in hostsexists in hosts and

in routers

ignores network routes data through network

Port #s used for routing in destination computer

IP addresses used for routing in network

3

Page 4: Ch. 7 : Internet  Transport Protocols

4

Multiplexing &Demultiplexing

Page 5: Ch. 7 : Internet  Transport Protocols

Multiplexing/demultiplexing

application

transport

network

link

physical

P1 application

transport

network

link

physical

application

transport

network

link

physical

P2P3 P4P1

host 1 host 2 host 3

= process= socket

receive segment from L3deliver each received segment to correct socket

Demultiplexing at rcv host:gather data from multiplesockets, envelop data with headers (later used for demultiplexing), pass to L3

Multiplexing at send host:

5

Page 6: Ch. 7 : Internet  Transport Protocols

How demultiplexing works host receives IP datagrams

each datagram has source IP address, destination IP address in its header

each datagram carries one transport-layer segment

each segment has source, destination port number in its header

host uses port numbers, and sometimes also IP addresses to direct segment to correct socket from socket data gets to

the relevant application process

source port # dest port #

32 bits

applicationdata

(message)

other header fields

TCP/UDP segment format

L4

head

er

ap

pl. m

sgL3

h

dr

other IP header fields

source IP addr dest IP addr.

6

Page 7: Ch. 7 : Internet  Transport Protocols

Connectionless demultiplexing (UDP)

Processes create sockets with port numbers

a UDP socket is identified by a pair of numbers:(my IP address , my port number)

Client decides to contact: a server ( peer IP-address) + an application ( peer port #)

Client puts those into the UDP packet he sends; they are written as: dest IP address - in the

IP header of the packet dest port number - in its

UDP header

When server receives a UDP segment: checks destination port

number in segment directs UDP segment to

the socket with that port number (packets from different remote sockets directed to same socket)

the UDP message waits in socket queue and is processed in its turn.

answer message sent to the client UDP socket (listed in Source fields of query packet)

7

Page 8: Ch. 7 : Internet  Transport Protocols

ClientIP:B

SP: 53

DP: 5775

S-IP: C

D-IP: B

SP = Source port numberDP= Destination port numberS-IP= Source IP AddressD-IP=Destination IP Address

Connectionless demux (cont)

client IP: A

P1

serverIP: C

SP and S-IP provide “return address”

P2

client socket:port=9157, IP=A

P3

server socket:port=53, IP = C

message

SP: 9157

DP: 53

S-IP: A

D-IP: C

IP-HeaderUDP-Header

SP: 53

DP: 9157

S-IP: C

D-IP: A

SP: 5775

DP: 53

S-IP: B

D-IP: C

message

client socket:port=5775, IP=B

message message

Wait for application

Getting Service

Reply

Getting Service

Reply

L1

L2

L4

L5

L3

8

Page 9: Ch. 7 : Internet  Transport Protocols

Connection-oriented demux (TCP)

TCP socket identified by 4-tuple: local (my) IP address local (my) port number remote IP address remote port number

receiving host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets: each socket identified

by its own 4-tuple

Web servers have a different socket for each connecting client If you open two browser

windows, you generate 2 sockets at each end

non-persistent HTTP will open a different socket for each request

9

Page 10: Ch. 7 : Internet  Transport Protocols

Connection-oriented demux (cont)

ClientIP: B

client IP: A

P1

serverIP: C

P3

client socket:LP= 9157, L-IP= ARP= 80 , R-IP= C

P1

LP= Local Port , RP= Remote Port L-IP= Local IP , R-IP= Remote IP

P4

server socket:LP= 80 , L-IP= CRP= 9157, R-IP= A

P6

server socket:LP= 80 , L-IP= CRP= 5775, R-IP= B

“L”= Local = My“R”= Remote = Peer

P2

client socket:LP= 5775, L-IP= BRP= 80 , R-IP= C

client socket:LP= 9157, L-IP= BRP= 80 , R-IP= C

P5

server socket:LP= 80 , L-IP= CRP= 9157, R-IP= B

H3

H4SP: 9157

DP: 80

S-IP: A

D-IP: C

packet:

message

SP: 5775

DP: 80

D-IP: CS-IP: B

packet:

message

SP: 9157

DP: 80

packet:

D-IP: CS-IP: B

message

L1

L2

L4

L5

L3

10

Page 11: Ch. 7 : Internet  Transport Protocols

11

UDP Protocol

Page 12: Ch. 7 : Internet  Transport Protocols

UDP: User Datagram Protocol [RFC 768]

simple transport protocol “best effort” service, UDP

segments may be: lost delivered out of order

to applicationwith no correction by UDP

UDP will discard bad checksum segments if so configured by application

connectionless: no handshaking

between UDP sender, receiver

each UDP segment handled independently of others

Why is there a UDP? no connection

establishment saves delay

no congestion control: better delay & BW

simple: small segment header typical usage: realtime

appl. loss tolerant rate sensitive

other uses (why?): DNS SNMP

12

Page 13: Ch. 7 : Internet  Transport Protocols

UDP segment structure

source port # dest port #

32 bits

applicationdata (variable length)

length

Total length of segment (bytes)

Checksum computed over:• the whole segment• part of IP header:

– both IP addresses– protocol field – total IP packet length

checksum

Checksum usage:• computed at destination to detect

errors• in case of error, UDP will discard

the segment, or

13

Page 14: Ch. 7 : Internet  Transport Protocols

14

UDP checksum

Sender: treat segment contents

as sequence of 16-bit integers

checksum: addition (1’s complement sum) of segment contents

sender puts checksum value into UDP checksum field

Receiver: compute checksum of

received segment check if computed

checksum equals checksum field value: NO - error detected YES - no error detected.

Goal: detect “errors” (e.g., flipped bits) in transmitted segment

Page 15: Ch. 7 : Internet  Transport Protocols

15

TCP Protocol

Page 16: Ch. 7 : Internet  Transport Protocols

16

TCP: Overview RFCs: 793, 1122, 1323, 2018, 2581

full duplex data: bi-directional data flow

in same connection MSS: maximum

segment size

connection-oriented: handshaking (exchange

of control msgs) init’s sender, receiver state before data exchange

flow controlled: sender will not

overwhelm receiver

point-to-point: one sender, one receiver between sockets

reliable, in-order byte steam: no “message

boundaries”

pipelined: TCP congestion and flow

control set window size

send & receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

Page 17: Ch. 7 : Internet  Transport Protocols

17

TCP segment structure

source port # dest port #

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberrcvr window size

ptr urgent datachecksum

FSRPAUheadlen

notused

Options (variable length)

URG: urgent data (generally not used)

ACK: ACK #valid

PSH: push data now(generally not used)

RST, SYN, FIN:connection estab(setup, teardown

commands)

# bytes rcvr willingto accept

countingby bytes of data(not segments!)

Internetchecksum

(as in UDP)

hdr length in 32 bit words

Page 18: Ch. 7 : Internet  Transport Protocols

TCP sequence # (SN) and ACK (AN)SN:

byte stream “number” of first byte in segment’s data

AN: SN of next byte

expected from other side

cumulative ACKQn: how receiver handles

out-of-order segments? puts them in receive

buffer but does not acknowledge them

Host A Host B

time

SN=42, AN=79, 100 data bytes

SN=79, AN=142, 50 data bytes

SN=142, AN=129 , no data

host A sends100 data bytes

host ACKsreceipt of data , sends no dataWHY?

host B ACKs 100bytes and sends50 data bytes

simple data transfer scenario (some time after conn. setup)

18

Page 19: Ch. 7 : Internet  Transport Protocols

19

Connection Setup: Objective Agree on initial sequence numbers

a sender should not reuse a seq# before it is sure that all packets with the seq# are purged from the network

• the network guarantees that a packet too old will be purged from the network: network bounds the life time of each packet

To avoid waiting for seq #s to disappear, start new session with a seq# far away from previous

• needs connection setup so that the sender tells the receiver initial seq#

Agree on other initial parameters e.g. Maximum Segment Size

Page 20: Ch. 7 : Internet  Transport Protocols

TCP Connection ManagementSetup: establish connection

between the hosts before exchanging data segments

called: 3 way handshake initialize TCP variables:

seq. #s buffers, flow control

info (e.g. RcvWindow) client : connection initiator

opens socket and cmds OS to connect it to server

server : contacted by client has waiting socket accepts connection generates working socket

Teardown: end of connection(we skip the details)

Three way handshake:

Step 1: client host sends TCP SYN segment to server specifies initial seq # no data

Step 2: server host receives SYN, replies with SYNACK segment (also no data) allocates buffers specifies server initial

SN & window sizeStep 3: client receives

SYNACK, replies with ACK segment, which may contain data

20

Page 21: Ch. 7 : Internet  Transport Protocols

TCP Three-Way Handshake (TWH)

A

Send Buffer

Receive Buffer

Send Buffer

Receive Buffer

SYN , SN = X

SYNACK , SN = Y, AN = X+1

ACK , SN = X+1 , AN = Y+1

X+1

X+1

Y+1

Y+1

B

21

Page 22: Ch. 7 : Internet  Transport Protocols

22

Connection Close

Objective of closure handshake: each side can release

resource and remove state about the connection

• Close the socket

client

I am done. Are you done too?

server

I am done too. Goodbye!

initial close :

close

close

release resource?

release resource

release resource

Page 23: Ch. 7 : Internet  Transport Protocols

23

Ch. 7 : Internet Transport Protocols

Part B

Page 24: Ch. 7 : Internet  Transport Protocols

3-24

TCP reliable data transfer

TCP creates reliable service on top of IP’s unreliable service

pipelined segments cumulative acks single retransmission

timer receiver accepts out

of order segments but does not acknowledge them

Retransmissions are triggered by timeout events

Initially consider simplified TCP sender: ignore flow control,

congestion control

Page 25: Ch. 7 : Internet  Transport Protocols

3-25

TCP sender events:data rcvd from app: create segment with

seq # seq # is byte-stream

number of first data byte in segment

start timer if not already running (think of timer as for oldest unACKed segment)

expiration interval: TimeOutInterval

timeout: retransmit segment

that caused timeout restart timer ACK rcvd: if acknowledges

previously unACKed segments update what is known

to be ACKed start timer if there are

outstanding segments

Page 26: Ch. 7 : Internet  Transport Protocols

Transport Layer3-26

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) { switch(event)

event: data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event: timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event: ACK received, with ACK field value of y if (y > SendBase) { SendBase = y if (there are currently not-yet-acknowledged segments) start timer } } /* end of loop forever */

Comment:• SendBase-1: last cumulatively ACKed byteExample:• SendBase-1 = 71;y= 73, so the rcvrwants 73+ ;y > SendBase, sothat new data is ACKed

Page 27: Ch. 7 : Internet  Transport Protocols

3-27

TCP actions on receiver events:

application takes data: free the room in

buffer give the freed cells

new numbers circular numbering

WIN increases by the number of bytes taken

data rcvd from IP: if Checksum fails, ignore

segment If checksum OK, then :

if data came in order: update AN+WIN AN grows by the number of

new in-order bytes WIN decreases by same #

if data out of order: Put in buffer, but don’t count it

for AN/ WIN

Page 28: Ch. 7 : Internet  Transport Protocols

3-28

TCP: retransmission scenarios

stop timer

stop timer

starttimer for

SN 100

Host A

AN=100

timeA. normal scenario

Host B

AN=120

SN=100 , 20 bytes data

SN=92, 8 bytes data

starttimer for

SN 92

NO timer

starttimer for

new SN 92

AN=100

Host ASN=92, 8 bytes data

Xloss

B. lost ACK + retransmission

Host B

SN=92, 8 bytes data

AN=100

time

starttimer for

SN 92

TIMEOUT

NO timer

stop timer

timer setting

actual timer run

Page 29: Ch. 7 : Internet  Transport Protocols

3-29" " ב ס א תשע אפקה Transport Layer 3-29

TCP retransmission scenarios (more)

AN=100

Host ASN=92, 8 bytes data

Xloss

C. lost ACK, NO retransmission

Host B

SN=100, 20 bytes data

AN=120

time

starttimer for

SN 92

stop timer

NO timer

Host A

timeD. premature timeout

Host BSN=92, 8 bytes data

AN=120

starttimer for

SN 92

TIMEOUT

NO timer

star fort 92stop

start for 100

stop

SN=100, 20 bytes data

AN=100

AN=120

SN=92, 8 bytes data

redundant ACK

Page 30: Ch. 7 : Internet  Transport Protocols

Transport Layer 3-30

TCP ACK generation [RFC 1122, RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq #. All data up toexpected seq # already ACKed

Arrival of in-order segment withexpected seq #. One other segment has ACK pending

Arrival of out-of-order segmentwith higher-than-expect seq. # .Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK. Wait up to 500msfor next segment. If no data segment to send, then send ACK

Immediately send single cumulative ACK, ACKing both in-order segments

Immediately send duplicate ACK, indicating seq. # of next expected byteThis Ack carries no data & no new WIN

Immediately send ACK, provided thatsegment starts at lower end of gap

Page 31: Ch. 7 : Internet  Transport Protocols

Transport

Layer

3-31

Fast Retransmit

time-out period often relatively long:

Causes long delay before resending lost packet

detect lost segments via duplicate ACKs.

sender often sends many segments back-to-back

if segment is lost, there will likely be many duplicate ACKs for that segment

If sender receives 3 ACKs for same data, it assumes that segment after ACKed data was lost: fast retransmit: resend

segment before timer expires

Page 32: Ch. 7 : Internet  Transport Protocols

Transport

Layer

3-32

Host A

tim

eout

Host B

time

X

resend seq X2

seq # x1seq # x2seq # x3seq # x4seq # x5

ACK # x2

ACK # x2ACK # x2ACK # x2

tripleduplicate

ACKs

Page 33: Ch. 7 : Internet  Transport Protocols

Transport

Layer

3-33

event: ACK received, with ACK field value of y if (y > SendBase) { SendBase = y if (there are currently not-yet-acknowledged segments) start timer } else {if (segment carries no data & doesn’t change WIN) increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) { { resend segment with sequence number y

count of dup ACKs received for y = 3 } }

Fast retransmit algorithm:

a duplicate ACK for already ACKed segment

fast retransmit

Page 34: Ch. 7 : Internet  Transport Protocols

34

TCP: Flow Control

Page 35: Ch. 7 : Internet  Transport Protocols

3-35

TCP Flow Control for A’s data

receive side of TCP connection at B has a receive buffer:

flow control matches the send rate of A to the receiving application’s drain rate at B

Receive buffer size set by OS at connection init

WIN = window size = number bytes A may send starting at AN

application process at B may be slow at reading from buffer

sender won’t overflow

receiver’s buffer bytransmitting too

much, too fast

flow control

node B : Receive process

Receive Buffer

data taken by

application

TCP datain buffer

spare room

WIN

data from IP

(sent by TCP at A)

AN

Page 36: Ch. 7 : Internet  Transport Protocols

3-36" " ב ס א תשע אפקה

TCP Flow control: how it works

Formulas: AN = first byte not received yet

sent to A in TCP header AckedRange =

= AN – FirstByteNotReadByAppl = = # bytes rcvd in sequence & not taken

WIN = RcvBuffer – AckedRange= SpareRoom

AN and WIN sent to A in TCP header Data rcvd out of sequence is

considered part of ‘spare room’ range

Procedure: Rcvr advertises “spare

room” by including value of WIN in his segments

Sender A is allowed to send at most WIN bytes in the range starting with AN guarantees that receive

buffer doesn’t overflow

node B : Receive process

ACKed datain buffer

Rcv Buffer

data from IPdata taken by

application

WIN

(sent by TCP at A)s p a r e r o o m

non-ACKed data in buffer(arrived out of order)

ignored

AN

Page 37: Ch. 7 : Internet  Transport Protocols

3-37" " ב ס א תשע אפקה

1 – דוגמה TCPבקרת זרימה של

Page 38: Ch. 7 : Internet  Transport Protocols

3-38" " ב ס א תשע אפקה

2 – דוגמה TCPבקרת זרימה של

Page 39: Ch. 7 : Internet  Transport Protocols

39

TCP: setting timeouts

Page 40: Ch. 7 : Internet  Transport Protocols

40

TCP Round Trip Time and TimeoutQ: how to set TCP

timeout value? longer than RTT

note: RTT will vary too short: premature

timeout unnecessary

retransmissions too long: slow

reaction to segment loss

Q: how to estimate RTT? SampleRTT: measured time

from segment transmission until ACK receipt ignore retransmissions,

cumulatively ACKed segments

SampleRTT will vary, want estimated RTT “smoother” use several recent

measurements, not just current SampleRTT

Page 41: Ch. 7 : Internet  Transport Protocols

41

High-level Idea

Set timeout = average + safe margin

Page 42: Ch. 7 : Internet  Transport Protocols

42

Estimating Round Trip Time

EstimatedRTT = (1- )*EstimatedRTT + *SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially

fast typical value: = 0.125

SampleRTT: measured time from segment transmission until ACK receipt SampleRTT will vary, want a “smoother” estimated RTT

use several recent measurements, not just current SampleRTT

RTT: gaia.cs.umass.edu to fantasia.eurecom.fr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

Page 43: Ch. 7 : Internet  Transport Protocols

43

Setting TimeoutProblem: using the average of SampleRTT will generate

many timeouts due to network variations

Solution: EstimtedRTT plus “safety margin”

large variation in EstimatedRTT -> larger safety margin

TimeoutInterval = EstimatedRTT + 4*DevRTT

DevRTT = (1-)*DevRTT + *|SampleRTT-EstimatedRTT|

(typically, = 0.25)

Then set timeout interval:

RTT

freq.

Page 44: Ch. 7 : Internet  Transport Protocols

44

An Example TCP Session

Page 45: Ch. 7 : Internet  Transport Protocols

45

TCP: Congestion Control

Page 46: Ch. 7 : Internet  Transport Protocols

46

TCP Congestion Control Closed-loop, end-to-end, window-based

congestion control Designed by Van Jacobson in late 1980s, based

on the AIMD alg. of Dah-Ming Chu and Raj Jain Works well so far: the bandwidth of the Internet

has increased by more than 200,000 times Many versions

TCP/Tahoe: this is a less optimized version TCP/Reno: many OSs today implement Reno

type congestion control TCP/Vegas: not currently used

For more details: see TCP/IP illustrated; or readhttp://lxr.linux.no/source/net/ipv4/tcp_input.c for linux implementation

Page 47: Ch. 7 : Internet  Transport Protocols

47

TCP & AIMD: congestion

Dynamic window size [Van Jacobson] Initialization: MI

• Slow start Steady state: AIMD

• Congestion Avoidance

Congestion = timeout TCP Tahoe

Congestion = timeout || 3 duplicate ACK TCP Reno & TCP new Reno

Congestion = higher latency TCP Vegas

Page 48: Ch. 7 : Internet  Transport Protocols

48

Visualization of the Two Phases

threshold

Congw

ing

Slow start

Congestion avoidance

MSS

Page 49: Ch. 7 : Internet  Transport Protocols

49

TCP Slowstart: MI

exponential increase (per RTT) in window size (not so slow!)

In case of timeout: Threshold=CongWin/2

initialize: Congwin = 1 MSSfor (each segment ACKed) Congwin+MSSuntil (congestion event OR CongWin > threshold)

Slowstart algorithmHost A

one segment

RTT

Host B

time

two segments

four segments

Page 50: Ch. 7 : Internet  Transport Protocols

50

TCP Tahoe Congestion Avoidance

/* slowstart is over */ /* Congwin > threshold */Until (timeout) { /* loss event */ on every ACK: CWin/MSS+= 1/(Cwin/MSS) }threshold = Congwin/2Congwin = 1 MSSperform slowstart

Congestion avoidance

TCP Tahoe

Page 51: Ch. 7 : Internet  Transport Protocols

51

TCP Reno Fast retransmit:

Try to avoid waiting for timeout

Fast recovery: Try to avoid slowstart. used only on triple duplicate event Single packet drop: not too bad

Page 52: Ch. 7 : Internet  Transport Protocols

0

10

20

30

40

50

60

70

0 10 20 30 40 50 60

Time

Co

ng

esti

on

Win

do

w

threshold

congestionwindowtimeouts

slow start period

additive increase

fast retransmission

52

TCP Reno cwnd Trace

CACA

CA

Slo

w S

tart

Slo

w S

tart

Sl.S

tart

triple duplicate Ack

Page 53: Ch. 7 : Internet  Transport Protocols

Transport

Layer

3-53

TCP congestion control: bandwidth probing

“probing for bandwidth”: increase transmission rate on receipt of ACK, until eventually loss occurs, then decrease transmission rate continue to increase on ACK, decrease on loss (since available

bandwidth is changing, depending on other connections in network)

ACKs being received, so increase rate

X

X

XX

X loss, so decrease rate

send

ing

rate

time

Q: how fast to increase/decrease? details to follow

TCP’s“sawtooth”behavior

Page 54: Ch. 7 : Internet  Transport Protocols

Transport

Layer

3-54

TCP Congestion Control: details

sender limits rate by limiting number of unACKed bytes “in pipeline”:

cwnd: differs from rwnd (how, why?) sender limited by min(cwnd,rwnd)

roughly,

cwnd is dynamic, function of perceived network congestion

rate = cwnd

RTT bytes/sec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

Page 55: Ch. 7 : Internet  Transport Protocols

Transport

Layer

3-55

TCP Congestion Control: more details

segment loss event: reducing cwnd

timeout: no response from receiver cut cwnd to 1

3 duplicate ACKs: at least some segments getting through (recall fast retransmit) cut cwnd in half, less

aggressively than on timeout

ACK received: increase cwnd slowstart phase:

start low (cwnd=MSS) increase cwnd exponentially

fast (despite name) used: at connection start, or

following timeout

congestion avoidance: increase cwnd linearly

Page 56: Ch. 7 : Internet  Transport Protocols

Transport

Layer

3-56

TCP Slow Start when connection begins, cwnd

= 1 MSS example: MSS = 500 bytes

& RTT = 200 msec initial rate = 20 kbps

available bandwidth may be >> MSS/RTT desirable to quickly ramp up

to respectable rate increase rate exponentially

until first loss event or when threshold reached double cwnd every RTT done by incrementing cwnd

by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Page 57: Ch. 7 : Internet  Transport Protocols

Transport

Layer

3-57

TCP: congestion avoidance

when cwnd > ssthresh grow cwnd linearly increase cwnd

by 1 MSS per RTT approach possible

congestion slower than in slowstart

implementation: cwnd = cwnd + MSS^2/cwnd for each ACK received

ACKs: increase cwnd by 1 MSS per RTT: additive increase

loss: cut cwnd in half (non-timeout-detected loss ): multiplicative decrease true in macro picture may require Slow

Start first to grow up to this

AIMD

AIMD: Additive IncreaseMultiplicative Decrease

Page 58: Ch. 7 : Internet  Transport Protocols

Transport

Layer

3-58

TCP congestion control FSM: overview

slow start

congestionavoidance

fastrecovery

cwnd > ssthresh

loss:timeout

loss:timeout new ACK loss:

3dupACK

loss:3dupACK

loss:timeout

Page 59: Ch. 7 : Internet  Transport Protocols

Transport

Layer

3-59

TCP congestion control FSM: details

slow start

congestionavoidance

fastrecovery

timeoutssthresh = cwnd/2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd/2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd > ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s),as allowed

new ACKcwnd = cwnd + MSS (MSS/cwnd)dupACKcount = 0transmit new segment(s),as allowed

new ACK.

dupACKcount++

duplicate ACK

ssthresh= cwnd/2cwnd = ssthresh + 3 MSSretransmit missing segment

dupACKcount == 3

dupACKcount++

duplicate ACK

ssthresh= cwnd/2cwnd = ssthresh + 3 MSS

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd/2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd + MSStransmit new segment(s), as allowed

duplicate ACK

cwnd = ssthreshdupACKcount = 0

New ACK

cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

Page 60: Ch. 7 : Internet  Transport Protocols

Transport

Layer

3-60

Popular “flavors” of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

ents

)

Page 61: Ch. 7 : Internet  Transport Protocols

Transport

Layer

3-61

Summary: TCP Congestion Control

when cwnd < ssthresh, sender in slow-start phase, window grows exponentially.

when cwnd >= ssthresh, sender is in congestion-avoidance phase, window grows linearly.

when triple duplicate ACK occurs, ssthresh set to cwnd/2, cwnd set to ~ ssthresh

when timeout occurs, ssthresh set to cwnd/2, cwnd set to 1 MSS.

Page 62: Ch. 7 : Internet  Transport Protocols

Transport

Layer

3-62

TCP throughput

Q: what’s average throughout of TCP as function of window size, RTT? ignoring slow start

let W be window size when loss occurs.when window is W, throughput is

W/RTT just after loss, window drops to W/2,

throughput to W/2RTT. average throughout: .75 W/RTT

Page 63: Ch. 7 : Internet  Transport Protocols

Transport

Layer

3-63

fairness goal: if K TCP sessions share same bottleneck link of bandwidth R, each should have average rate of R/K

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Page 64: Ch. 7 : Internet  Transport Protocols

Transport

Layer

3-64

Why is TCP fair?

Two competing sessions: (Tahoe, Slow Start ignored) Additive increase gives slope of 1, as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

ion

2 t h

rou g

h pu t

congestion avoidance: additive increase

loss: decrease window by factor of 2congestion avoidance: additive increase

loss: decrease window by factor of 2

(a,b)

(a+t,b+t) => y = x+(b-a)

(a/2+t1/2+t,b/2+t1/2+t) => y = x+(b-a)/2

y = x+(b-a)/4

Page 65: Ch. 7 : Internet  Transport Protocols

Transport

Layer

3-65

Fairness (more)

Fairness and UDP multimedia apps

often do not use TCP do not want rate

throttled by congestion control

instead use UDP: pump audio/video at

constant rate, tolerate packet loss

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts.

web browsers do this example: link of rate R

supporting already9 connections; new app asks for 1 TCP,

gets rate R/10 new app asks for 11 TCPs,

gets R/2 !!

Page 66: Ch. 7 : Internet  Transport Protocols

Transport

Layer

3-66

Exercise MSS = 1000 Only one event per row