Transport Services and Protocols Transport Layer protocols
Transcript of Transport Services and Protocols Transport Layer protocols
TLP 1
Goals:Goals:
Understand principles behind Transport layer services
Instantiation and implementation in the Internet
TLP 2
Transport Layer - Overview
Understanding:Understanding:
Transport layer servicesTransport layer services
Multiplexing/Multiplexing/demultiplexingdemultiplexing
Connectionless transport: UDPConnectionless transport: UDP
Principles of reliable data transferPrinciples of reliable data transfer
ConnectionConnection--oriented transport: TCPoriented transport: TCP
reliable transfer
flow control
connection management
TCP congestion control TCP congestion control
TLP 3
Transport Services and ProtocolsTransport Services and Protocols
Provide logical communicationbetween app’ processes running on different hostsTransport protocols run in endTransport protocols run in endsystems (only)systems (only)TransportTransport vsvs network layer network layer services:services:
network layer: data transferbetween nodes/end systemstransport layer: data transferbetween processes at end systemsrelies on, but enhances, network layer service capability
applicationtransportnetworkdata linkphysical
applicationtransportnetworkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysicalnetwork
data linkphysical
logical end-end transport
TLP 4
Transport Layer protocolsTransport Layer protocols
Internet transport services:Internet transport services:
Reliable, in-order unicast
delivery: TCP
• congestion
• flow control
• connection setup
Unreliable ( best-effort),unordered unicast ormulticast delivery: UDP
Services not provided by TCP: Services not provided by TCP:
• real-time (need RTP, RTCP)
• bandwidth guarantees
• reliable multicast
applicationtransportnetworkdata linkphysical
applicationtransportnetworkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysicalnetwork
data linkphysical
logical end-end transport
TLP 5
UDP• Delivery of packet without guarantee (of arrival and in-order)
• No handshaking and ACKnowledgement fast response
• Reliability of link is application’s responsibility
UDP datagramTCP segment
IP packet
ACK packet(TCP only)
• Need Connection setup before transmission
• Guarantee packet delivery (no duplication) and in-order reception, byte-stream oriented
• Reliability of link is TCP’s responsibility
TCP
TLP 6
Transmission Control Layer
• Two Protocol suites of TCP in internet architecture:- UDP (User Datagram Protocol) (RFC768)
~ provides connectionless, unreliable, without flow control services
- TCP (Transport Control Protocol) (RFC793, 1122, 1323, 2018, 2581)
~ provides connection-oriented, reliable, and byte-streamoriented, with flow control services
• Encapsulation of TPL’s PDU in IP packetIP (20B)header
TCP (20B)/UDP(8B)header
TCP/UDPdata
IP packet (65535B max)
UDP datagramTCP segment
B: bytes
TLP 7
Multiplexing and DemultiplexingMultiplexing and Demultiplexing
~ gathering data from multipleapplication processes(sockets),enveloping data with header(later used for demul.)
Multiplexing:
~ delivering received segmentsto correct app layer processesvia socket
Demultiplexing:
applicationtransportnetwork
P P2applicationtransportnetwork
HtHn segment
segment Mapplicationtransportnetwork
P1P
P PP3 P4
segmentheader
~ unit of data exchanged between transport layer entitiesakaaka TPDUTPDU: Transport Protocol Data Unit: Transport Protocol Data Unit
~ socket
~ processPreceiver
application-layer data
TLP 8
Multiplexing/Demultiplexing (cont’d)Multiplexing/Multiplexing/DemultiplexingDemultiplexing (cont(cont’’d)d)
Source portSource port Destination portDestination port
UDPUDPDataData
Message lengthMessage length ChecksumChecksum
Source portSource port Destination portDestination port
Sequence numberSequence number
Acknowledgment numberAcknowledgment number
Data offset ReservedData offset Reserved WindowWindow
ChecksumChecksum Urgent pointerUrgent pointer
OptionOption PaddingPadding
TCP dataTCP data
UURRPP
AACCKK
PPSSHH
RRSSTT
SSYYNN
FFIINN
TCP/UDP headerTCP/UDP headerDADA SASA TFTF CRCCRCDataDataIPIP HeaderHeader
• Multiplexing/Demultiplexing:
– based on IP addresses,sender’s and receiver’sport numbers
TLP 9
Port Number Description01579
111315171920212325374243537779809395
101102103104111113117119129139
ReserveTCP M ultiplexerRemote Job EntryEchoDiscardActive UsersDaytimeNetwork status programQuote of the dayCharacter generatorFTP dataFTP commandTerminal ConnectionSM TPTimeHost Name ServerW ho isDomain Name ServerPrivate RJE serviceFingerHttp protocolDevice Control ProtocolSUPDUP ProtocolNIC host name serverIOS-TSAPX.400 mail serviceX.400 mail sendingSUN RPCAuthentication ServiceUUCP-path serviceUSENET news Transfer ProtocolPassword Generator ProtocolNETBIOS Session Service
• Source port numbers~ randomly assigned by the
sending host (1024< # <65536)• Destination port numbers
~ the well-known one or theincoming source port # (# <1024)
TCP Well-known Port Numbers
TLP 10
UDP Well-known Port Numbers
PortNum ber
Description
079
111315171937424353676869111123161162512513514525
ReserveEchoD iscardA ctive UsersD aytimeN etw ork status programQ uote of the dayCharacter G eneratorTimeH ost N ame ServerW ho isD omain Name ServerBootstrap Protocol ServerBootstrap Protocol ClientTrivial File Transfer (TFTP)Sun M icrosystems RPCN etw ork Time Protocol (NTP)SNM P net monitorSNM P trapsU NIX comsatU NIX rwho daemonSystem logTime daemon
TLP 11
Assigned, Registered and Dynamic Port NumbersAssigned, Registered and Dynamic Port Numbers
• RFC 1700.• FTP site: ftp://isi.edu./in-notes/iana/assignments.
– Up-to-date assignments of numbers
• Assigned port numbers range from 0 - 1023.– Assigned are reserved by IANA and cannot be used– Used for TCP, IP, UDP and various applications such
as TELNET
• Registered range for 1024 - 65535 and these are companies that have registered their application.
• Dynamic port numbers are also in the range of 1024 - 65535.
• RFC 1700.• FTP site: ftp://isi.edu./in-notes/iana/assignments.
– Up-to-date assignments of numbers
• Assigned port numbers range from 0 - 1023.– Assigned are reserved by IANA and cannot be used– Used for TCP, IP, UDP and various applications such
as TELNET
• Registered range for 1024 - 65535 and these are companies that have registered their application.
• Dynamic port numbers are also in the range of 1024 - 65535.
[ check with Unix/Linux files: /etc/services ] TLP 12
HowHow DemultiplexingDemultiplexing works ?works ?
• When host receives IP datagrams . . .
– each datagram has source IP address, destination IP address
– each datagram carries 1 transport-layer segment
– each segment has source, destination port number(recall: well-known port numbersfor specific applications)
• Host uses IP address & port numberto direct segment to the appropriatesocket (w.r.t. a process)
source port # dest port #
32 bits
applicationdata
(message)
other header fields
TCP/UDP segment format
TLP 13
Mux/DeMux (TCP): Example I
• One process to one connection
host A
applicationtransportnetwork
P’
Telnet client
server B
applicationtransportnetwork
P
Telnet server
src port: 5678dest. port: 23
source port:23dest. port:5678
TLP 14
Mux/DeMux (TCP) : Example II
• Multiple connection to multiple processes
Web clienthost C
Web clienthost A
transportnetwork
DeMUX
Web serverhost B
B’s IP = 140.124.13.3Well-known Port = 80
C’s IP: 140.112.234.2Dest IP: B
src port: 79767976dest. port: 80
C’s IP: 140.112.234.2Dest IP: B
src port: 8879dest. port: 80
Src IP: 140.124.70.13Dest IP: 140.124.13.3
source port: 8879dest. port: 80
P2’P1’
P1’
P2P1 P3
TLP 15
Mux/DeMux (TCP): Example I
• One process to one connection
host A
applicationtransportnetwork
P’
Telnet client
server B
applicationtransportnetwork
P
Telnet server
src port: 5678dest. port: 23
source port:23dest. port:5678
TLP 16
MUX/DeMUX Happened Everywhere
SMTP FTPTELNETPING
AP LayerDNSSNMP BOOTP NTP
Based onframe’s L/T
Based onprotocol type
Based onport #
TCP UTPTP LayerSegment
orDatagram
RARPARP
ICMPIP
IGMP
Internet Layer(S/W modules)
Packet(Daragram)
DATA LINK (e.g., Ethernet)
Medium (Frames)Ntwk Access Layer
Frame+
Bits
Demultiplexing
(from physical link)
(Interface-SAP)
Multiplexing
TRACEROUTE
TLP 17
UDP: User Datagram Protocol [RFC 768]
Why is there a UDP?Why is there a UDP?
no connection setup/ establishing time(whichcan add delay)
simple: no connection state at sender’s and receiver’s app
small segment header
• Low overhead
no congestion control:UDP can blast away as fast as desired (unregulated sending rate)
““no frills,no frills,”” ““bare bonesbare bones””Internet transport protocolInternet transport protocol
““best effortbest effort”” service, UDP service, UDP segments may be:segments may be:
lost
delivered out of order toapplications
connectionless:connectionless:
no handshaking between UDP sender, receiver
each UDP segment handledindependently of others
TLP 18
UDP Header and Segment FormatUDP Header and Segment Format
Often used for Often used for streamingstreamingmultimediamultimedia apps withapps with
loss tolerant
rate sensitive
Other UDP uses:Other UDP uses:
DNS
SNMP
Reliable transfer over UDP: Reliable transfer over UDP: add reliability at application add reliability at application layerlayer
application-specific errorrecover!
32 bits
source port # dest port #
Applicationdata
(message)
length checksumLength, in
bytes of UDPsegment,including
header
UDP segment format
TLP 19
UDP Checksum
Goal:Goal: detectdetect ““errorserrors”” (e.g., flipped bits) in transmitted segment
Sender:Sender:
treat segment contents as
sequence of 16-bit integers
checksum: addition (1’scomplement sum) of
segment contents
sender puts checksum value
into UDP checksum field
Receiver:Receiver:
compute checksum of
received segment
check if computed checksum
equals checksum field value:
NO - error detected
YES - no error detected.
But maybe errors But maybe errors nonethlessnonethless??
(e.g., flipped bits) in transmitted segment
See next slide for implementation detailspp.200-201 TLP 20
Checksum in the UDP HeaderChecksum in the UDP Header
• ChecKSum
( IP’s CKS with the differences of following )1. Allowing odd # of data byte (by padding one byte of “0”
but don’t transmit it)2. Including pseudo-header from IP header (12 bytes
counted in total)
• Goal : to verify that the UDP DG has reached its correct destination
• No CKS used if CKS = all 0’s being transmitted.
• Transmit 65535 if computed CKS = all 0’s (one’s complement)
• CKS adds pseudo hdr and UDP data (plus 8-bit 0’s if necessary)
SRC IP(4B), DEST IP(4B), 00 + Protocol (2B), UDP length(2B)
TLP 21
TCP: Overview (TCP: Overview (RFCsRFCs: 793, 1122, 1323, 2018, 2581): 793, 1122, 1323, 2018, 2581)
full duplex data:full duplex data:
bi-directional data flow in same connection
MSS: maximum segmentsize
connectionconnection--oriented:oriented:
handshaking (exchange of control msgs) init’ssender, receiver statebefore data exchange
flow controlled:flow controlled:
sender will not overwhelm receiver
pointpoint--toto--point:point:
one sender, one receiver
reliable, inreliable, in--orderorder byte steam:byte steam:
no “message boundaries”pipelined:pipelined:
TCP congestion and flow control set window size
send & receive bufferssend & receive buffers
socketdoor
TCPsend buffer
TCPreceive buffer
socketdoor
segment
applicationwrites data
applicationreads data
TLP 22
Principles of Reliable data transfer
Important issue in application, transport, and link layersImportant issue in application, transport, and link layers
Top of important networking topics!Top of important networking topics!
Being called when data arrives
Being called when pkt arrives
(details coming next)
Net
wo r
kla
yer
characteristics of a unreliable channel will determine thecharacteristics of a unreliable channel will determine thecomplexity of reliable data transfer (complexity of reliable data transfer (rdtrdt) protocol.) protocol.udtudt ~ unreliable data transfer protocol (IP, here)~ unreliable data transfer protocol (IP, here)
TLP 23
Reliable data transfer: getting started
rdt_send(): called from above,(e.g., by app.). Passed data to
deliver to receiver upper layer
udt_send(): called by rdt,to transfer packet over
unreliable channel to receiver
rdt_rcv(): called when packetarrives on rcv-side of channel
deliver_data(): called by rdt to deliver data to upper
sendside
receiveside
TLP 24
IP contradicts TCP ?IP contradicts TCP ?
• TCP provides completely reliable transfer• (But) IP offers best-effort (unreliable) delivery• TCP uses IP ? (YES ) How does it be done ?
Reliable Data Transmission rely on . . .- Positive acknowledgmentPositive acknowledgment
~ Receiver returns a short message (called ACK,acknowledgement) to the sender when data arrives
- Retransmission (upon timeout)Retransmission (upon timeout)~ Sender starts timer whenever a segment is transmitted~ If timer expires before acknowledgment arrives,
sender retransmits THE message
• Recall: C.O.
C.L.
TLP 25
TCP Header - I
Headlength receiver window size
• TCP packed data in “segment” but counting/tracking by bytes.
• Seq# and Ack#: Counting by bytes of data (not segments)!TLP 26
TCP Header – II
• Sequence number (SEQ # ) :- identifies each byte in the stream of data from the sending TCP to the receiving TCP (byte streams)
- numbering ranging from 0 to 232 -1 and wrapping backaround to 0
- SEQ # = (so-called) initial SEQ # (ISN) when SYN = 1(the first (data) segment = ISN + 1)
• Acknowledgment number :- the next sequence number that the receiver expects to
receive (i.e., the piggybacked ACK)
= the SEQ # of the last successfully received data byte + 1
(ACK 1 when the connection is firstly established)
(flag)
SQN is bounded to octets rather than to entire segments.
TLP 27
TCP Header – III
• Data Offset = header length (HL) in 32-bit word, (60 bytes max)
• Code bits :
- URG “urgent pointer” field is valid (when it is set to 1)
- ACK Making ACK number valid (when it is set to 1)
- PSH sender should send out all data in the sending buffer
receiver should pass this data to an application ASAP
- RST reset the connection (port unreachable)
- SYN synchronize sequence numbers to initiate a connection
- FIN sender is finished sending data (ask to close connection)
• Window (for credit allocation flow control) :indicating the number of bytes the sender is willing to accept
con
n. m
anag
emen
t
TLP 28
TCP Connection Establishment
• Establishing a connection between two ends before exchanging data
• Connection establishing protocol ~ a threethree--way handshakingway handshaking
client server
(SYN = 1, Seq# = j)(Active open)SYN_SENTSYN_SENT
Listen (passive open)SYN = j = ISN
ACK = k+1
SYN = k, ACK = j+1ISNOpen a conn.
||Open a socket
SYN_RCVDSYN_RCVD( k ~ Rxer’s seq # )
initialize TCP variables:seq. #, buffers, flowcontrol info (e.g.RcvWindow)
ConnectionEstablishedEstablished
EstablishedEstablished
- SYN consumes one sequence number- ISN should change over time (differs from connection to connection )
TLP 29
DecomposeDecompose PDUsPDUs in a TCP/IP Scenarioin a TCP/IP Scenario
Windows> telnet 140.124.70.26 (showing the first two packets sending by the client)
Protocol #: Network--Transport layer
TLP 30
Src port # (randomly generated by the src PC – 1059, here)Dest port # (an well-known for well-known application)
Port #: Transport--Application layer
(PDU cont’d)
(Selective ACKnowledgment)(for reliable, in-order reception)
- see next pages
TLP 31
StopStop--andand--Wait ProtocolWait Protocol
RTT
last packet bit transmitted, t = L / R
ACK arrives, send nextpacket, t = RTT + L / R
• Sends one segment and
waits for Ack returning
before continuing
sending the next segment
(performance)
time
pipe
receiversender
first packet bit transmitted, t = 0
first packet bit arriveslast packet bit arrives, send ACK
(assuming no error)
TLP 32
Performance of StopPerformance of Stop--andand--Wait ProtocolWait Protocol(rdt3.0 – Alternating-bit protocol, textbook)
rdt3.0 works, but performance stinksrdt3.0 works, but performance stinksPerformance issue: Performance issue: Example: 1 Example: 1 GbpsGbps link, 15 ms elink, 15 ms e--e prop. delay, 1KB packet:e prop. delay, 1KB packet:
Ttransmit = 8kb/pkt10**9 b/sec = 8 microsec
(channel capacity)
(Packet size)
• Sender/channel Utilization
Utilization = U = =8 usec
30.008 msfraction of time
sender busy sendingBits into the channel
= 0.00027
(15.008 x 2, if ACK ignored)(Sender) (or 0.027%)
(ref. P.214)Send 1KB pkt every 30.008 mseceffective throughput only 267 kbps over 1 Gbps linknetwork protocol limits use of physical resources a lot!
TLP 33
Pipelined protocolsPipelined protocols (Why need ?)
Pipelining :Pipelining : allowignallowign sender to send multiple, sender to send multiple, ““inin--flightflight””,,yetyet--toto--bebe-- acknowledgedacknowledged pktspkts w/o waiting for w/o waiting for ACKsACKs
For reliable data transfer :For reliable data transfer :the range of sequence numbers must be increased (not the range of sequence numbers must be increased (not retxretx.).)need to buffer more than one packet at sender and/or receiverneed to buffer more than one packet at sender and/or receiver
Two generic forms of pipelined protocols:Two generic forms of pipelined protocols:
go-Back-N and Selective repeat
filling a pipeline
• Seq.# range and buffering depend on the manner in which a data transfer protocol responds to lost, corrupted, and overly delayed packets.
(pipelined with error recovery) TLP 34
Pipelining:Pipelining: increaseingincreaseing utilizationutilization
RTT
last bit transmitted, t = L / R
ACK arrives, send nextpacket, t = RTT + L / R
last bit of 3rd packet arrives, send ACK
receiversender
first packet bit transmitted, t = 0 (assuming no error)
first packet bit arriveslast packet bit arrives, send ACKlast bit of 2nd packet arrives, send ACK
(next cycle begins)
Increase utilizationby a factor of 3!
Usender=
.02430.008
= 0.00083 * L / RRTT + L / R
= 0.00027(0.08%)
TLP 35
Go-Back-NSenderSender ::
kk--bitbit seqseq # in # in pktpkt headerheader
““windowwindow”” of up to N, consecutiveof up to N, consecutive unAckunAck’’eded pktspkts allowed (the window size)allowed (the window size)
Preview : sliding window
ACK(n):ACK(n): ACKsACKs allall pktspkts up to, including up to, including seqseq # n ~ # n ~ ““cumulativecumulativeACKACK”” (Advantage: see Fig. 3.34)
may deceive duplicate ACKs (see receiver) ?? You find it out.
Set timer for each inSet timer for each in--flightflight pktpkt
timeout(n):timeout(n): retransmitretransmit pktpkt n and all higher n and all higher seqseq ## pktspkts in windowin windowTLP 36
GBN (Cont’d)
Receiver :Receiver :
ACKACK--only: always send ACK for correctlyonly: always send ACK for correctly--receivedreceived
pktpkt with highest with highest inin--orderorder seqseq ##
may generate duplicate ACKs
need only remember expected seqnum
outout--ofof--order packet: order packet:
discard (don’t buffer) no receiver buffering
ACK pkt with highest in-order seq #
TLP 37
GBN in action
discarddiscard
discarddiscard
discarddiscard
reTx
TLP 38
Selective Repeat/Selective Repeat/RejecctRejecct
ReceiverReceiver individuallyindividually acknowledges all correctlyacknowledges all correctlyreceivedreceived pktspkts
buffers pkts, as needed, for eventual in-order delivery to upper layer
Sender only resends Sender only resends pktspkts for which ACK not receivedfor which ACK not received
sender timer for each unACKed pkt
Sender windowSender window
N consecutive seq #’sagain limits seq #s of sent and unACKed pkts
TLP 39
Selective repeat: sender, receiver windows
(Read: Fig. 3.23-25 for Sender’s and receiver’s events and actions) TLP 40
Selective Repeat in action
Window size = 4
loss
TLP 41
Selective Repeat: a dilemmaSelective Repeat: a dilemma
ExampleExample::
seq #’s: 0, 1, 2, 3 (size = 4)
window size = 3 < Max seq #
Receiver sees no differencein both scenarios (a) and (b).
Incorrectly passes duplicate
data as new in case (a)
Q:Q: To prevent this ambiguity,To prevent this ambiguity,what should be the what should be the relationship betweenrelationship between seqseq ##size and window size?size and window size?
A: sequence # space >= 2*window
(duplicate pkt 0)
(new pkt 0)
?
(Sec. 3.4.4)
||ReTx the 1st pkt
Tx 5th pkt
Invisiblecurtain
00
00
ReTx the 1st pkt
Tx 5th pkt(Problem 3.18)
InternetInternet
TLP 42
Connection Maintenance (Ex: Telnet Scenario)Connection Maintenance (Ex: Telnet Scenario)an interactive application
Q:Q: How receiver handles How receiver handles outout--ofof--order segments ?order segments ?
AA: TCP spec doesn: TCP spec doesn’’t say, t say, ~ up to ~ up to implementorimplementor
(go(go--backback--N or Selective Repeat)N or Selective Repeat)
Host A Host B
Seq=42, ACK=79, data = ‘C’
Seq=79, ACK=43, data = ‘C’
Seq=43, ACK=80, . . .
Usertypes
‘C’
host ACKsreceipt
of echoed‘C’
host ACKsreceipt of‘C’, echoes
back ‘C’
time
• "echo back"ensure that characters
seen by Telnet user have already
been received and processed at
remote site.
• Each character traverses the
network twice
Segment exchange
TLP 43
TCP Connection Termination
(Active close)FIN_WAIT_1
FIN_WAIT_2
CLOSE_WAIT
Listen (passive close)FIN = M(FIN=1& SYN=M)
ACK = N+1
ACK = M+1
CLOSED
FIN = N
TIME_WAIT
LAST_ACK
(2 MSL wait state)
• MSL = Max Segment Lifetime; MSL in RFC 793 = 2 min, max.
Close a conn.||
close a socket
Timed wait
CLOSED
(closing)
Resources at both C and S are deallocated.
(ACK=1& SYN=M+1)
Resend ACKin case it lost
(if ACK rxed)
client server
• Connection termination protocol 3-way but taking four segments TLP 44
TCP Connection ManagementTCP Connection Management
TCP client lifecycle
TCP server lifecycle
TLP 45
TCP: retransmission scenarios
lost ACK scenario premature timeout, cumulative ACKs
Host A
Seq=100, 20 bytes data
ACK=100
Seq=
92 t
imeo
ut
Host B
Seq=92, 8 bytes data
ACK=120
Seq=92, 8 bytes data
Seq=
100
tim
eout
ACK=120New timeoutfor seq.=92
Host A
Seq=92, 8 bytes data
ACK=100
loss
tim
eout
time
Host B
X
Seq=92, 8 bytes data
ACK=100
Duplicated. Host B’s action?
time
TLP 46
TCP Flow Control
receiver: explicitly informssender of (dynamicallychanging) amount of freebuffer space - RcvWindow field in
TCP segmentsender: limits the amount of
transmitted, unACKeddata less than most recently received RcvWindow
- guarantees receive buffer doesn’t overflow
sender won’t overrunreceiver’s buffers by
transmitting too much,too fast
flow control
RcvBuffer = size or TCP Receive BufferRcvWindow = amount of spare room in Buffer
❒ spare room in buffer= RcvWindow
= RcvBuffer-[LastByteRcvd - LastByteRead]
TLP 47
Flow Control - Sliding Window
• To improve the utilization of the channel in the cases of Tprop > Tframe
by allowing multiple frames to be transmited before receiving ACK(s) (to improve the performance of the stop-and-wait mechanism)
• To keep track of which frames without waiting for any ACKed, each frame is labeled with sequence number.
• Rule of sliding window:
- Txer maintains a list of SEQ numbers that it is allowed to send
- Rxer maintains a list of SEQ numbers that it is prepared to receive
- Frames are numbered (0 ~ 2K-1) modulo 2K , k = # of bits in SEQ #
- The window size 2K , and the SEQ # has a bounded size since it occupies a field in the frame
- Sender must buffer these frames in case they need to be retransmitted
• Applied to Go-back-N and Selective-reject ARQ, and LLC, HDLC, and X.25
Window of frames
(?)
TLP 48
Sliding window flow control (cont’d)
ACK
ACKRR ~ Receiver Ready (in HDLC)
Example
Back to GBN
TLP 49
TCP Flow Control - Credit Allocation
• Operation:- Sending TCP includes a SEQ # of the first byte in the
segment field - Receiving TCP ACKs an incoming segment with (A=i, W=j),
whereA=i expecting SEQ = i and all SEQ prior to i are ACKedW=j granting of permission to send additional j (window)
bytes, i.e., corresponding to SEQ # in i ~ (i+j-1)
• Some examples of granting credit:
Assuming Rxer just issued (A=i, W=j )]
- Rxer issues (A=i, W=k) to increase credit to k (k > j) when no additional data have arrived
- Rxer issues (A=i+m, W=j-m) without granting additionalcredit to ACK an incoming segment containing m bytes (m < j)
TLP 50
ExampleExample
(granted permission)
Remaining credits
- sending 200 bytes/segment; sending and receiving SEQ# are synchronized through connection establishment
- initial credit = 1400 bytes, and SEQ # = 1001
+ 600
A=1001, W=1400
TLP 51
TCP Round Trip Time and Timeout
Q: How to estimate RTT?SampleRTT: measured time from segment transmission until ACK receipt, ignoreretransmissions and cumulatively ACKedsegmentsSampleRTT will vary, wantestimated RTT “smoother”
average several recent measurements, not justcurrent SampleRTT
Q: How to set TCP timeout value?longer than RTT
note: RTT will vary
too short:premature timeout,
unnecessaryretransmissionstoo long: slowreaction tosegment loss (which is unnecessary)
TLP 52
EstimatEstimation ofion of RTTRTT
- Exponential weighted moving average (why?)
- influence of given sample decreases exponentially fast
- typical value of = 0.125 (RFC 2988)
EstimatedRTT = (1- )*EstimatedRTT + *SampleRTT
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconds)
RTT
(milli
seco
nds)
RTT: from gaia.cs.umass.edu to fantasia.eurecom.fr
Sample RTT Estimated RTT
TLP 53
RTO (Retransmission Time Out)RTO (Retransmission Time Out)
Setting the timeout❒ EstimtedRTT plus “safety margin”
❍ large variation in EstimatedRTT -> larger safety margin❒ First estimate of how much SampleRTT deviates from
EstimatedRTT:
❒
TimeoutInterval(RTO)= EstimatedRTT + 4*DevRTT
DevRTT = (1- )*DevRTT +
*|SampleRTT - EstimatedRTT|
(typically, = 0.25)
Then set timeout interval:
TLP 54
Principles of Congestion ControlPrinciples of Congestion Control
Congestion:informally: “too many sources sending too much data too fast for network to handle”different from flow control (w.r.t. receiver)Manifestations:
lost packets (buffer overflow at routers)long delays (queueing in router buffers)
a top-10 problem!
TLP 55
Approaches towards congestion controlApproaches towards congestion control
Two broad approaches towards congestion control:
2. Network-assistedcongestion control:
❒ routers provide feedbackto end systems❍ single bit indicating
congestion (SNA, DECbit, TCP/IP ECN(RFC2481), ATM)
❍ explicit rate sendershould send at
1. End-end congestion control:
❒ no explicit feedback from network
❒ congestion inferredfrom end-system observed loss, delay
❒ approach taken by TCP
Ref: Sec 3.6.2~3.6.3
(Ex: choke packet in PSN)TLP 56
Case study: ATM ABR congestion control
ABR: available bit rate
❒ “elastic service”❒ if sender’s path
“underloaded”:❍ sender should use
available bandwidth❒ if sender’s path
congested:❍ sender throttled to
minimum guaranteed rate
RM (resource management)cells:
❒ sent by sender, interspersedwith data cells
❒ Two bits in RM cell set by switches (i.e., “network-assisted”)❍ NI bit: No Increase in rate
(mild congestion)❍ CI bit: Congestion Indication
❒ RM cells returned to sender by receiver, with bits intact
TLP 57
TCP Congestion Control : OverviewTCP Congestion Control : Overview
Important variables:- Congwin:
~ congestionwindow size
- threshold:~ defines threshold between two slowstart phase and congestion control phase
“probing” for usable bandwidth:- ideally: transmit as fast
as possible (Congwin aslarge as possible) without loss
- increase Congwin untilloss (congestion)
- loss happened: decreaseCongwin, then begin probing (increasing)again
, ? • Congwin cwnd• threshold ssthresh
Abbreviations:
TLP 58
TCP Congestion Control (cont’d)
❒ end-end control (no networkassistance)
❒ sender limits transmission:LastByteSent-LastByteAcked
CongWin❒ Roughly,
❒ CongWin is dynamic, functionof perceived networkcongestion
How does sender perceive congestion?
❒ loss event = timeout or 3duplicate acks
❒ TCP sender reduces rate (CongWin) after loss event
Three mechanisms:❍ AIMD❍ Slow start❍ Congestion Avoidance
rate = CongWinRTT Bytes/sec
TLP 59
I. TCP AIMD Congestion ControlI. TCP AIMD Congestion Control
❒ Additive Increase:~ increase CongWin
by 1 MSS every RTT in the absenceof loss events: probing
❒ Multiplicative Decrease:~ cut CongWin in half
after loss event
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow AIMD Operation
TLP 60
II. Slow StartII. Slow Start
When connection begins, increase rate exponentially fast untilfirst loss event.
• Operation:- Initializing cwnd = 1 (1 MSS) whenever opening a new connection- Increasing cwnd by 1 (up to a Max) every time an ACK is received- At any time, TCP measures the congestion window in segment
and restrains the transmission by
awnd = Min { credit, cwnd }
awnd = allowed window (currently allowed to send w/o receiving ACKs)cwnd = congestion window (used at startup and reduced during congestion)credit = receiver advertised window (used to calculate window/segment size)
• Slow start probes the internet to make sure not to send too manysegments into an already congested network
• Connection’s data flow is controlled by the incoming ACK (not cwnd)
TLP 61
Slow Start Operation
- A is sending 100-bytesegments
- A can fill the pipe with a continuous flow of segmentsafter approximately FOURRTTs
• Slow start may be a misnomer since cwndgrows exponentially(pretty much close to)
Initializationa new connection
Really slow ?
SN = 1
ACK = 1011st
RTT
1st RTT
SN = 101
SN = 201
ACK = 201
2nd RTT
ACK = 801
SN = 701
3rd RTT
ACK = 1501
SN = 1401
4th RTT TLP 62
III. Congestion Avoidance
• Also, Dynamic Window sizing on Congestion (Jacobson [88/95])~ modified the growth of cwnd from exponential to linear~ a way to deal with the segment loss :
a timeout occurring and receipt of duplicate ACKs
III. Congestion Avoidance
• Operation:- Begin with slow start algorithm until a congestion occurs :- Set ssthresh (a slow start threshold) = cwnd/2
- Set cwnd = 1 and perform slow start process
(i.e., increase cwnd by 1 for every ACK received)
until cwnd = ssthresh- For cwnd ssthresh, increase cwnd by one for each round-trip
time (RTT)
TLP 63
Slow start, endingwith a timeout ssthresh = 8
- check how longit would take torecover the cwndlevel beforecongestion ?
6th
RTT1st
RTT
cwnd = 9
• ExampleExample Congestion avoidance (cont’d)
Slow start, endingwith a timeout
counted asONE more RTT)
Exponentialgrowth of cwnd
Lineargrowth of cwnd
TLP 64
Comparison of Slow Start andCongestion Avoidance
1
Exponentialgrowth of cwnd
Lineargrowth of cwnd8
9
ssthresh
(RTT)
(See what the texkbook says.)
TLP 65
TCP Slow Start AlgoTCP Slow Start Algorithmrithm
initialize: Congwin = 1for (each segment ACKed)
Congwin++until (loss event OR
CongWin > threshold)
Slowstart algorithm
Host A
one segment
RTT
Host B
time
two segments
four segments• exponential increase (per RTT) in window size (not so slow!)
• loss event timeout(Tahoe TCP) and/or or three duplicate ACKs(Reno TCP)
TLP 66
TCP Congestion Avoidance : TahoeTCP Congestion Avoidance : Tahoe
/* slowstart is over */ /* Congwin > threshold */Until (loss event) {every w segments ACKed:
Congwin++}
threshold = Congwin/2Congwin = 1perform slowstart
TCP Tahoe Congestion avoidance
TLP 67
TCP Congestion Avoidance : RenoTCP Congestion Avoidance : Reno
/* slowstart is over */ /* Congwin > threshold */Until (loss event) {
every w segments ACKed:Congwin++
}threshold = Congwin/2If (loss detected by timeout) {
Congwin = 1Threshold = Congwin/2perform slowstart }
If (loss detected by tripleduplicate ACK) {
Congwin = Congwin/2,Congwin increases linearly }
• Three duplicate ACKs(Reno TCP):
• Some segments are getting through correctly!
• Don’t “overreact” bydecreasing window to 1 as in Tahoe– decrease window
size by half
TCP Reno Congestion avoidance
3 dup ACKs indicatesnetwork capable of delivering some segments
TLP 68
TCP Reno versus TCP Tahoe:
0
2
4
6
8
10
12
14
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Transmission round
con
ges
tio
n w
ind
ow
size
(seg
men
ts)
threshold(variable)
Series1 Series2TCPTahoe
TCPReno
Fig. 3-51 Evolution of TCP Congestion window (Tahoe and Reno)
TLP 69
TCP FairnessTCP Fairness
• TCP Fairness goal:
~ if K TCP connections pass through a router (share samebottleneck link), each TCP should get R/K of link capacity
Example: TCP connection 1
bottleneckRouter capacity R
TCPconnection 2
❒ Example: link of rate R supporting 9 connections; ❍ What if :new app asks for 1 TCP, gets rate R/10❍ What if new app asks
(A: R/2) for 11 TCPs, gets what ?
TLP 70
(Joined) Throughput Realized by Two TCPs
• Two competing sessions:– Additive increase gives slope of 1, as throughout increases
– multiplicative decrease decreases throughput proportionally
How TCP approaches fairness ?
R equal bandwidth share
Conn
ecti
on2
thro
ugh p
ut
congestion avoidance: additive increaseloss: decrease window by factor of 2
congestion avoidance: additive increaseloss: decrease window by factor of 2
Goal : having achieved throughput fallsomewhere around intersection
Assuming starting
Connection 1 throughput R
TLP 71
The End
Understanding the Computer
TLP 72