1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind...
-
Upload
shannon-ward -
Category
Documents
-
view
215 -
download
0
Transcript of 1 Chapter 7 Internet Transport Protocols. Transport Layer Our goals: r understand principles behind...
1
Chapter 7Internet Transport
Protocols
Transport Layer
Transport LayerOur goals: understand
principles behind transport layer services: Multiplexing /
demultiplexing data streams of several applications
reliable data transfer flow control congestion control
Chapter 6: rdt principlesChapter 7: multiplex/ demultiplex Internet transport layer
protocols: UDP: connectionless
transport TCP: connection-oriented
transport• connection setup• data transfer• flow control• congestion control
22
Transport vs. network layer
Transport layer uses Network layer services adds value to these services
Transport Layer Network Layer
logical communication between processes
logical communication between hosts
exists only in hostsexists in hosts and
in routers
ignores network routes data through network
Port #s used for “routing” to the intended process
inside destination computer
IP addresses used for routing in network
3
4
Socket Multiplexing
Multiplexing/demultiplexing
application
transport
network
link
physical
P1 application
transport
network
link
physical
application
transport
network
link
physical
P2P3 P4P1
host 1 host 2 host 3
= process= socket
receive segment from L3deliver each received segment to the right socket
Demultiplexing at rcv host:gather data from multiplesockets, envelop data with headers (later used for demultiplexing), pass to L3
Multiplexing at send host:
5
How demultiplexing works host receives IP datagrams
each datagram has source IP address, destination IP address in its header• used by network to get it there
each datagram carries one transport-layer segment
each segment has source, destination port number in its header
host uses port #s(*) to direct segment to correct socket from socket data gets to the
relevant application process
source port # dest port #
32 bits
applicationdata
(message)
other header fields
TCP/UDP segment format
L4
head
er
ap
pl. m
sgL3
h
dr
other IP header fields
source IP addr dest IP addr.
6
(*) to find a TCP socket on server, source & dest. IP address is also needed, see details later
Connectionless demultiplexing (UDP)
Processes create sockets with port numbers
a UDP socket is identified by a pair of numbers:(my IP address , my port number)
Client decides to contact: a server (peer IP-address) an application ( “WKP”)
puts those into the UDP packet sent, written as: dest IP address - in the
IP header of the packet dest port number - in its
UDP header
When server receives a UDP segment: checks destination port
number in segment directs UDP segment to
the socket with that port number
• single server socket per application type
• (packets from different remote sockets directed to same socket)
msg waits in socket queue and processed in its turn.
answer sent to the client socket (listed in Source fields of query packet)
7
Realtime UDP applications have individual server sockets per client. However their port numbers are distinct, since they are coordinated in advance by some signaling protocol. This is possible since port number is not used to specify the application.
ClientIP:B
SP: 53
DP: 5775
S-IP: C
D-IP: B
SP = Source port numberDP= Destination port numberS-IP= Source IP AddressD-IP=Destination IP Address
Connectionless demux (cont)
client IP: A
P1
serverIP: C
SP and S-IP provide “return address”
P2
client socket:port=9157, IP=A
P3
server socket:port=53, IP = C
IP-HeaderUDP-Header
SP: 53
DP: 9157
S-IP: C
D-IP: A
SP: 5775
DP: 53
S-IP: B
D-IP: C
message
client socket:port=5775, IP=B
message message
Wait for application
Getting Service
Reply
Getting Service
Reply
L1
L2
L4
L5
L3
8
message
SP: 9157
DP: 53
S-IP: A
D-IP: C
Connection-oriented demux (TCP)
TCP socket identified by 4-tuple: local (my) IP address local (my) port number remote (peer) IP
address remote (peer) port #
host receiving a packet uses all four values to direct the segment to appropriate socket
Server host may support many simultaneous TCP sockets: each socket identified
by its own 4-tuple
Web server dedicates a different socket to each connecting client If you open two browser
windows, you generate 2 sockets at each end
9
Connection-oriented demux (cont)
ClientIP: B
client IP: A
P1
serverIP: C
P3
client socket:
LP= 9157, L-IP= ARP= 80 , R-IP= C
P1
LP= Local Port , RP= Remote Port L-IP= Local IP , R-IP= Remote IP
P4
server socket:
LP= 80 , L-IP= CRP= 9157, R-IP= A
P6
server socket:
LP= 80 , L-IP= CRP= 5775, R-IP= B
“L”= Local = My“R”= Remote = Peer
P2
client socket:
LP= 5775, L-IP= BRP= 80 , R-IP= C
client socket:
LP= 9157, L-IP= BRP= 80 , R-IP= C
P5
server socket:LP= 80 , L-IP= CRP= 9157, R-IP= B
SP: 5775
DP: 80
D-IP: CS-IP: B
packet:
messageL1
L2
L4
L5
L3
10
H3
H4SP: 9157
DP: 80
S-IP: A
D-IP: C
packet:
messageSP: 9157
DP: 80
D-IP: CS-IP: B
message
packet:
Connection-oriented Sockets Client socket has a port
number unique in host packet for client socket
directed by the host OS based on dest. port only
each server application has an always active waiting socket;
that socket receives all packets not belonging to any established connection
these are packets that open new connections
when waiting socket accepts a ‘new connection’ segment,
a new socket is generated at server with same port number
this is the working socket for that connection
next sockets arriving at server on connection will be directed to working socket
socket will be identified using all 4 identifiers
last slide shows working sockets on the server side
11Note: Client IP + Client Port are globally
unique
12
UDP Protocol
UDP: User Datagram Protocol [RFC 768]
simple transport protocol “best effort” service, UDP
segments may be: lost delivered out of order
to applicationwith no correction by UDP
UDP will discard bad checksum segments if so configured by application
connectionless: no handshaking
between UDP sender, receiver
each UDP segment handled independently of others
Why is there a UDP? no connection
establishment saves delay
no congestion control: better delay & BW
simple: less memory & RT small segment header typical usage: realtime
appl. loss tolerant rate sensitive
other uses (why?): DNS SNMP
13
UDP segment structure
source port # dest port #
32 bits
application data (variable length)
length
Total length of segment (bytes)
checksum
14
Checksum computed over:• the whole segment, plus• part of IP header:
– both IP addresses– protocol field – total IP packet length
Checksum usage:• computed at destination to
detect errors• on error, discard segment, • checksum is optional
• if not used, sender puts checksum = all zeros
• computed zero = all ones
15
TCP Protocol
16
TCP: Overview RFCs: 793, 1122, 1323, 2018, 2581
full duplex data: bi-directional data flow
in same connection MSS: maximum
segment size
connection-oriented: handshaking (exchange
of control msgs) init’s sender, receiver state before data exchange
flow controlled: sender will not
overwhelm receiver
point-to-point: one sender, one receiver works between sockets
reliable, in-order byte stream: no “message boundaries”
pipelined: TCP congestion and flow
control set window size
send & receive buffers
socketdoor
T C Psend buffer
T C Preceive buffer
socketdoor
segm ent
applicationwrites data
applicationreads data
17
TCP segment structure
source port # dest port #
32 bits
applicationdata
(variable length)
sequence number
acknowledgement numberrcvr window size
ptr urgent datachecksum
FSRPAUheadlen
notused
Options (variable length)
URG: indicates startof urgent data
ACK: ACK # valid
PSH: indicates urgent data ends in this segm. ptr = end urgent data
# bytes rcvr willingto accept
countingby bytes of data(not segments!)
Internetchecksum
(as in UDP)
hdr length in 32 bit words
PSH, URG seldom usednot clearly defined
SYN: initialize conn., synchronize SN
FIN: I wish to disconn.
RST: break conn. immediately
FLAGS
TCP sequence # (SN) and ACK # (AN)
SN: byte stream
“number” of first byte in segment’s data
AN: SN of next byte
expected from other side
it’s a cumulative ACK
Qn: how receiver handles out-of-order segments? puts them in receive
buffer but does not acknowledge them
Host A Host B
time
SN=42, AN=79, 100 data bytes
SN=79, AN=142, 50 data bytes
SN=142, AN=129 , no data
host A sends100 data bytes
host ACKsreceipt of data , sends no dataWHY?
host B ACKs 100bytes and sends50 data bytes
simple data transfer scenario (some time after conn. setup)
18
19
Connection Setup: Objective Agree on initial sequence numbers
a sender should not reuse a seq# before it is sure that all packets with the seq# are purged from the network
• the network guarantees that a packet too old will be purged from the network: network bounds the life time of each packet
To avoid waiting for them to disappear, choose initial SN (ISN) far away from previous session
• needs connection setup so that the sender tells the receiver initial seq#
Agree on other initial parameters e.g. Maximum Segment Size
TCP Connection ManagementSetup: establish connection
between the hosts before exchanging data segments
called: 3 way handshake initialize TCP variables:
seq. #s buffers, flow control
info (e.g. RcvWindow) client : connection initiator
opens socket and cmds OS to connect it to server
server : contacted by client has waiting socket accepts connection generates working socket
Teardown: end of connection(we skip the details)
Three way handshake:Step 1: client host sends TCP
SYN segment to server specifies initial seq #
(ISN) no data
Step 2: server host receives SYN, replies with SYNACK segment (also no data) allocates buffers specifies server initial
SN & window sizeStep 3: client receives SYNACK,
replies with ACK segment, which may contain data
20
TCP Three-Way Handshake (TWH)
A
Send Buffer
Receive Buffer
Send Buffer
Receive Buffer
SYN , SN = X
SYNACK , SN = Y, AN = X+1
ACK , SN = X+1 , AN = Y+1
X+1
X+1
Y+1
Y+1
B
21
22
Connection Close
Objective of closure handshake: each side can release
resource and remove state about the connection
• Close the socket
client
FINI am done. Are you done too?
server
FIN : I am done too.
Goodbye!
initial close :
close
close
release resource?
release resource
release resource
no data fromclient
7-23
TCP reliable data transfer
TCP creates reliable service on top of IP’s unreliable service
pipelined segments cumulative acks single retransmission
timer receiver accepts out
of order segments but does not acknowledge them
Retransmissions are triggered by timeout events in some versions of
TCP also by triple duplicate ACKs (see later)
Initially consider simplified TCP sender: ignore flow control,
congestion control
7-24
TCP sender events:data rcvd from app: create segment with
seq # seq # is byte-stream
number of first data byte in segment
start timer if not already running (timer relates to oldest unACKed segment)
expiration interval: TimeOutInterval
timeout (*): retransmit segment
that caused timeout restart timer ACK rcvd: if ACK acknowledges
previously unACKed segments update what is known
to be ACKedNote: Ack is cumulative
start timer if there are outstanding segments
(*) retransmission done also on triple duplicate Ack (see later)
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) { switch(event) event: data received from application above if (NextSeqNum-send_base < N) then { create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data) } else reject data /* in truth: keep in send buffer until new Ack */ event: timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer event: ACK received, with ACK field value of y if (y > SendBase) { SendBase = y if (there are currently not-yet-acknowledged segments) start timer } } /* end of loop forever */
Comment:• SendBase-1: last cumulatively ACKed byteExample:• SendBase-1 = 71;y= 73, so the rcvrwants 73+ ;y > SendBase, sothat new data is ACKed
7-25Transport Layer
7-26
TCP actions on receiver events:
application takes data: free the room in
buffer give the freed cells
new numbers circular numbering
WIN increases by the number of bytes taken
data rcvd from IP: if Checksum fails, ignore
segment If checksum OK, then : if data came in order: update AN &WIN, as follows:
AN grows by the number of new in-order bytes
WIN decreases by same # if data out of order: Put in buffer, but
don’t count it for AN/ WIN
7-27
TCP: retransmission scenarios
stop timer
stop timer
starttimer for
SN 100
Host A
AN=100
timeA. normal scenario
Host B
AN=120
SN=100 , 20 bytes data
SN=92, 8 bytes data
starttimer
for SN 92
NO timer
starttimer for
new SN 92
AN=100
Host ASN=92, 8 bytes data
Xloss
B. lost ACK + retransmission
Host B
SN=92, 8 bytes data
AN=100
time
starttimer
for SN 92
TIMEOUT
NO timer
stop timer
timer setting
actual timer run
7-28Transport Layer 7-28
TCP retransmission scenarios (more)
AN=100
Host ASN=92, 8 bytes data
Xloss
C. lost ACK, NO retransmission
Host B
SN=100, 20 bytes data
AN=120
time
starttimer
for SN 92
stop timer
NO timer
Host A
timeD. premature timeout
Host BSN=92, 8 bytes data
AN=120
starttimer
for SN 92
TIMEOUT
NO timer
start for 92stop
start for 100
stop
SN=100, 20 bytes data
AN=100
AN=120
SN=92, 8 bytes data
redundant ACK
DROP !
Transport Layer 7-29
TCP ACK generation (Receiver rules)
Event at Receiver
Arrival of in-order segment withexpected seq #. All data up toexpected seq # already ACKed
Arrival of in-order segment withexpected seq #. One other segment has ACK pending
Arrival of out-of-order segmentwith higher-than-expect seq. # .Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK. Wait up to 500msfor next segment. If no data segment to send, then send ACK
Immediately send single cumulative ACK, ACKing both in-order segments
Immediately send duplicate ACK, indicating seq. # of next expected byteThis Ack carries no data & no new WIN
Immediately send ACK, provided thatsegment starts at lower end of 1st gap
[RFC 1122, RFC 2581]
Transport Layer 7-30
Fast Retransmit (Sender Rules) time-out period often
relatively long: Causes long delay before
resending lost packet
idea:detect lost segments via duplicate ACKs.
sender often sends many segments back-to-back
if segment is lost, there will likely be many duplicate ACKs for that segment
Rule: If sender receives 4 ACKs for same data (= 3 duplicates), it assumes that segment after ACKed data was lost: fast retransmit: resend
segment immediately (before timer expires)
Host A
tim
eout
Host B
time
X
resend seq X2
seq # x1seq # x2seq # x3seq # x4seq # x5
ACK # x2
ACK # x2ACK # x2ACK # x2
tripleduplicate
ACKs
Fast Retransmit scenario
* no data in segment* no window change
Transport Layer 7-31
Transport
Layer
7-32
event: ACK received, with ACK field value of y if (y > SendBase) { SendBase = y if (there are currently not-yet-acknowledged segments) start timer } else {if (segment carries no data & doesn’t change WIN) increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) { { resend segment with sequence number y
count of dup ACKs received for y = 0 } }
Fast retransmit algorithm:
a duplicate ACK for already ACKed segment
fast retransmit
33
TCP: setting timeouts
34
General idea
Q: how to set TCP timeout interval?
should be longer than RTT but: RTT will vary
if too short: premature timeout unnecessary
retransmissions if too long: slow
reaction to segment loss
Set timeout = average + safe margin :
Average
margin
Timeout Interval
35
Estimating Round Trip Time
EstimatedRTT = (1- )*EstimatedRTT + *SampleRTT
Exponential weighted moving average influence of past sample decreases exponentially
fast typical value: = 0.125
SampleRTT: measured time from segment transmission until receipt of ACK for it SampleRTT will vary, want a “smoother” estimated RTT
use several recent measurements, not just current SampleRTT
RTT: gaia.cs.umass.edu to fantasia.eurecom.fr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
36
Setting TimeoutProblem: using the average of SampleRTT will generate
many timeouts due to network variations
Solution: EstimatedRTT plus “safety margin”
large variation in EstimatedRTT -> requires larger safety margin
Estimate average deviation of RTT:
TimeoutInterval = EstimatedRTT + 4*DevRTT
Then set timeout interval:
RTT
freq.
DevRTT = (1-)*DevRTT + *|SampleRTT-EstimatedRTT|(typically, = 0.25)
37
TCP: Flow Control
7-38
TCP Flow Control: Simple Case
TCP at A sends data to B The picture below shows the
TCP receive-buffer at B
flow control matches the send rate of A to the receiving application’s drain rate at B
Receive buffer size set by OS at connection init
WIN = window size = number bytes A may send starting at AN
application process at B may be slow at reading from buffer
sender won’t overflow
receiver’s buffer bytransmitting too
much, too fast
flow control
node B : Receive process
Receive Buffer
data taken by
application
TCP datain buffer
spare room
WIN
data from IP
(sent by TCP at A)
AN
7-39
TCP Flow control: General Case
Formulas: AN = first byte not received yet
sent to A in TCP header AckedRange =
= AN – FirstByteNotReadByAppl= = # bytes rcvd in sequence ¬ taken
WIN = RcvBuffer – AckedRange= “SpareRoom”
AN and WIN sent to A in TCP header Data received out of sequence is
considered part of ‘spare room’ range
Procedure: Rcvr advertises “spare
room” by including value of WIN in his segments
Sender A is allowed to send at most WIN bytes in the range starting with AN guarantees that receive
buffer doesn’t overflow
node B : Receive process
ACKed datain buffer
Rcv Buffer
data from IPdata taken by
application
WIN
(sent by TCP at A)s p a r e r o o m
non-ACKed data in buffer(arrived out of order)
ignored
AN
7-40
1 – דוגמה TCPבקרת זרימה של
7-41
2 – דוגמה TCPבקרת זרימה של
42
TCP: Congestion Control
43
TCP Congest’n Ctrl Overview (1)
Closed-loop, end-to-end, window-based congestion control
Designed by Van Jacobson in late 1980s, based on the AIMD algorithm of Dah-Ming Chu and Raj Jain
Works well so far: the bandwidth of the Internet has increased by more than 200,000 times
Many versions TCP-Tahoe: this is a less optimized version TCP-Reno: many OSs today implement Reno
type congestion control TCP-Vegas: not currently usedFor more details: see Stevens: TCP/IP illustrated; K-R chapter 6.7, or read:
http://lxr.linux.no/source/net/ipv4/tcp_input.c for linux implementation
44
Dynamic window size [Van Jacobson] Initialization: MI (Multiplicative Increase)
• Slow start Steady state: AIMD
(Additive Increase / Multiplicative Decrease)• Congestion Avoidance
“Congestion is timeout || 3 duplicate ACK” TCP Tahoe: treats both cases identically TCP Reno: treat each case differently
“Congestion = (also) higher latency” TCP Vegas
TCP Congest’n Ctrl Overview (2)
General method sender limits rate by limiting number
of unACKed bytes “in pipeline”:
cwnd: differs from WIN (how, why?) sender limited by ewnd ≡ min(cwnd,WIN)
(effecive window)
roughly,
cwnd is dynamic, function of perceived network congestion
rate = ewnd
RTT bytes/sec
LastByteSent-LastByteAcked cwnd (*)
cwndbytes
RTT
ACK(s)
Transport Layer 7-45
46
The Basic Two Phases
cwn
d
Slow start
Congestion avoidance
MSS
Multiplicative Increase
Additive Increase
Pure AIMD: Bandwidth Probing Principle
“probing for bandwidth”: increase transmission rate on receipt of ACK, until eventually loss occurs, then decrease transmission rate continue to increase on ACK, decrease on loss (since
available bandwidth is changing, depending on other connections in network) ACKs being received,
so increase rate slowly
X
X
XX
Xloss, so decrease rate fast
send
ing
rate
time
Q: how fast to increase/decrease? details to follow
TCP’s“sawtooth”behavior
Transport Layer 7-47
AI
MD
AIMD
this model ignores Slow Start
48
TCP Slowstart: MI
(*) doubled per RTT:• exponential increase in window size (very fast!)• therefore slowstart lasts a short time
initialize: cwnd = 1 MSSfor (each segment ACKed) cwnd += MSS (*)until (congestion event OR cwnd ≥ threshold)On congestion event:
{Threshold = cwnd/2cwnd = 1 MSS }
Slowstart algorithmHost A
one segment
RTT
Host B
time
two segments
four segments
* used in all TCP versions
TCP: congestion avoidance (CA) when cwnd > ssthresh
grow cwnd linearly:as long as all ACKs arrive increase cwnd
by ≈1 MSS per RTT approach possible
congestion slower than in slowstart
implementation: cwnd += MSS^2/cwnd for each ACK received
ACKs: increase cwnd by 1 MSS per RTT: additive increase
loss(*): cut cwnd in half : multiplicative decrease true in macro picture in actual algorithm
may have Slow Start first to grow up to this value (+)
AIMD
Transport Layer 7-49
(*) = Timeout or 3 Duplicate(+) depends on case & TCP type
50
TCP Tahoe
AI MDCA
SStSSt
CA
TCP TahoeT/O or 3 Dup
Initialize with SlowStartstate with cwnd = 1 MSS
When cwnd ≥ ssthresh change to CA state
When sense congestion(*): set ssthresh =ewnd/2 (+) set cwnd = 1 MSS change state to SlowStart
(*) Timeout or Triple Duplicate Ack
(+) recall ewnd = min(cwnd, WIN); in our discussion here we assume that WIN > cwnd, so ewnd=cwnd
TCP Reno
Rationale: triple duplicate event
shows less congestion than timeout first segment probably lost
but some others arrived
therefore on 3Dup, cwnd is decreased to ewnd/2,skipping SlowStart stage less aggressive than on
T/O this is an approximate
description; more details to the right and two slides below
TCP Reno Procedure Initialize with SlowStart Slowstart as in Tahoe CA growth as in Tahoe On T/O, act as in Tahoe On Triple Duplicate,
set ssthresh = ewnd/2 enter Fast Recovery
state this is a temporary state
until a non-Dup Ack arrives
when Fast Recovery ends, set: cwnd = ssthresh
Transport Layer 7-51
Fast RecoveryRationale: cwnd increases only when
a new segment is Ack’ed in the 3 Dup situation, it
may take time until such Ack arrives.
Until that time: we increase cwnd on the
arrival of each duplicate Ack, including the three that triggered Fast Retransmit
when new Ack arrives set cwnd = ssthresh
Fast Recovery State Initialize cwnd += 3
MSS on each additional
duplicate Ack increase cwnd by MSS
when a new segment is acknowledged, setcwnd = ssthresh
recall that ssthresh was set to half of the last ewnd value in CA state
Transport Layer 7-52
0
10
20
30
40
50
60
70
0 10 20 30 40 50 60
Time
Co
ng
esti
on
Win
do
w
threshold
congestionwindowtimeouts
slow start period
additive increase
fast retransmission
53
TCP Reno cwnd Trace
CACA
CA
Slo
w S
tart
Slo
w S
tart
Sl.S
tart
triple duplicate Ack
fast recovery stage skipped
TCP Reno Cong. Ctrl State Transition Diagram
slow start
congestionavoidance
fastrecovery
cwnd > ssthresh
loss:timeout
loss:timeout
new ACK loss:3dupACK
loss:3dupACK
loss:timeout
Transport Layer 7-54
cwnd > ssthresh
TCP Reno Congestion Control FSM
slow start
congestionavoidance
fastrecovery
timeoutssthresh = cwnd/2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
timeoutssthresh = cwnd/2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s),as allowed
new ACK
new ACKcwnd = cwnd + MSS (MSS/cwnd)dupACKcount = 0transmit new segment(s),as allowed
.
dupACKcount++
duplicate ACK
dupACKcount == 3
ssthresh= cwnd/2cwnd = ssthresh + 3 MSSretransmit missing segment
dupACKcount++
duplicate ACK
dupACKcount == 3
ssthresh= cwnd/2cwnd = ssthresh + 3 MSS
retransmit missing segment
timeoutssthresh = cwnd/2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd + MSStransmit new segment(s), as allowed
duplicate ACK
cwnd = ssthreshdupACKcount = 0
New ACK
cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
INIT
check == 3?
check == 3?
Transport Layer 7-55
Popular “flavors” of TCP
ssthresh
ssthresh
TCP Tahoe
TCP Reno
Transmission round
cwnd w
ind
ow
siz
e
(in
segm
ents
)
Transport Layer 7-56
Summary: TCP Reno Congestion Control when cwnd < ssthresh, sender in slow-start
phase, window grows exponentially.
when cwnd >= ssthresh, sender is in congestion-avoidance phase, window grows linearly.
when triple duplicate ACK occurs, ssthresh set to cwnd/2, cwnd eventually set to ~ ssthresh(after detour to Fast Retransmit state)
when timeout occurs, ssthresh set to cwnd/2, cwnd set to 1 MSS.
Transport Layer 7-57
Transport Layer 7-58
TCP throughput
Q: what’s average throughout of TCP as function of window size, RTT? ignoring slow start
let W be window size when loss occurs.when window is W, throughput is
W/RTT just after loss, window drops to W/2,
throughput to W/2RTT, then grows linearly slow
average throughout: .75 W/RTT
fairness goal: if K TCP sessions share same bottleneck link of bandwidth R, each should have average rate of R/K
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
Transport Layer 7-59
Why is TCP fair?
Two competing sessions: (Tahoe, Slow Start ignored) Additive increase gives slope of 1, as throughout increases multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Connect
ion 2
th
roughput
congestion avoidance: additive increase
loss: decrease window by factor of 2
(a,b)
(a+t,b+t) => y = x+(b-a)
(a/2+t/2+t1,b/2+t/2+t1) ; y = x+(b-a)/2
y = x+(b-a)/4
Transport Layer 7-60
((a+t)/2,(b+t)/2) => y = x+(b-a)/2
y = x+(b-a)/4
Fairness (more)
Fairness and UDP multimedia apps
often do not use TCP do not want rate
throttled by congestion control
instead use UDP: pump audio/video at
constant rate, tolerate packet loss
Fairness and parallel TCP connections
nothing prevents appl. from opening parallel connections between two hosts.
web browsers do this example: link of rate R
supporting already9 connections; new app asks for 1 TCP,
gets rate R/10 new app asks for 11 TCPs,
gets > R/2 !!
Transport Layer 7-61
62
Extra Slides
Exercise MSS = 1000 Only one event per row
Transport Layer 7-63