Transport Services and Protocols Transport Layer protocols

18
TLP 1 Goals: Goals: Understand principles behind Transport layer services Instantiation and implementation in the Internet TLP 2 Transport Layer - Overview Understanding: Understanding: Transport layer services Transport layer services Multiplexing/ Multiplexing/ demultiplexing demultiplexing Connectionless transport: UDP Connectionless transport: UDP Principles of reliable data transfer Principles of reliable data transfer Connection Connection - - oriented transport: TCP oriented transport: TCP reliable transfer flow control connection management TCP congestion control TCP congestion control TLP 3 Transport Services and Protocols Transport Services and Protocols Provide logical communication between appprocesses running on different hosts Transport protocols run in end Transport protocols run in end systems (only) systems (only) Transport Transport vs vs network layer network layer services: services: network layer: data transfer between nodes/end systems transport layer: data transfer between processes at end systems relies on, but enhances, network layer service capability application transport network data link physical application transport network data link physical network data link physical network data link physical network data link physical network data link physical network data link physical logical end-end transport TLP 4 Transport Layer protocols Transport Layer protocols Internet transport services: Internet transport services: Reliable, in-order unicast delivery: TCP congestion flow control connection setup Unreliable ( best-effort), unordered unicast or multicast delivery: UDP Services not provided by TCP: Services not provided by TCP: real-time (need RTP, RTCP) bandwidth guarantees reliable multicast application transport network data link physical application transport network data link physical network data link physical network data link physical network data link physical network data link physical network data link physical logical end-end transport

Transcript of Transport Services and Protocols Transport Layer protocols

TLP 1

Goals:Goals:

Understand principles behind Transport layer services

Instantiation and implementation in the Internet

TLP 2

Transport Layer - Overview

Understanding:Understanding:

Transport layer servicesTransport layer services

Multiplexing/Multiplexing/demultiplexingdemultiplexing

Connectionless transport: UDPConnectionless transport: UDP

Principles of reliable data transferPrinciples of reliable data transfer

ConnectionConnection--oriented transport: TCPoriented transport: TCP

reliable transfer

flow control

connection management

TCP congestion control TCP congestion control

TLP 3

Transport Services and ProtocolsTransport Services and Protocols

Provide logical communicationbetween app’ processes running on different hostsTransport protocols run in endTransport protocols run in endsystems (only)systems (only)TransportTransport vsvs network layer network layer services:services:

network layer: data transferbetween nodes/end systemstransport layer: data transferbetween processes at end systemsrelies on, but enhances, network layer service capability

applicationtransportnetworkdata linkphysical

applicationtransportnetworkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysicalnetwork

data linkphysical

logical end-end transport

TLP 4

Transport Layer protocolsTransport Layer protocols

Internet transport services:Internet transport services:

Reliable, in-order unicast

delivery: TCP

• congestion

• flow control

• connection setup

Unreliable ( best-effort),unordered unicast ormulticast delivery: UDP

Services not provided by TCP: Services not provided by TCP:

• real-time (need RTP, RTCP)

• bandwidth guarantees

• reliable multicast

applicationtransportnetworkdata linkphysical

applicationtransportnetworkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysicalnetwork

data linkphysical

logical end-end transport

TLP 5

UDP• Delivery of packet without guarantee (of arrival and in-order)

• No handshaking and ACKnowledgement fast response

• Reliability of link is application’s responsibility

UDP datagramTCP segment

IP packet

ACK packet(TCP only)

• Need Connection setup before transmission

• Guarantee packet delivery (no duplication) and in-order reception, byte-stream oriented

• Reliability of link is TCP’s responsibility

TCP

TLP 6

Transmission Control Layer

• Two Protocol suites of TCP in internet architecture:- UDP (User Datagram Protocol) (RFC768)

~ provides connectionless, unreliable, without flow control services

- TCP (Transport Control Protocol) (RFC793, 1122, 1323, 2018, 2581)

~ provides connection-oriented, reliable, and byte-streamoriented, with flow control services

• Encapsulation of TPL’s PDU in IP packetIP (20B)header

TCP (20B)/UDP(8B)header

TCP/UDPdata

IP packet (65535B max)

UDP datagramTCP segment

B: bytes

TLP 7

Multiplexing and DemultiplexingMultiplexing and Demultiplexing

~ gathering data from multipleapplication processes(sockets),enveloping data with header(later used for demul.)

Multiplexing:

~ delivering received segmentsto correct app layer processesvia socket

Demultiplexing:

applicationtransportnetwork

P P2applicationtransportnetwork

HtHn segment

segment Mapplicationtransportnetwork

P1P

P PP3 P4

segmentheader

~ unit of data exchanged between transport layer entitiesakaaka TPDUTPDU: Transport Protocol Data Unit: Transport Protocol Data Unit

~ socket

~ processPreceiver

application-layer data

TLP 8

Multiplexing/Demultiplexing (cont’d)Multiplexing/Multiplexing/DemultiplexingDemultiplexing (cont(cont’’d)d)

Source portSource port Destination portDestination port

UDPUDPDataData

Message lengthMessage length ChecksumChecksum

Source portSource port Destination portDestination port

Sequence numberSequence number

Acknowledgment numberAcknowledgment number

Data offset ReservedData offset Reserved WindowWindow

ChecksumChecksum Urgent pointerUrgent pointer

OptionOption PaddingPadding

TCP dataTCP data

UURRPP

AACCKK

PPSSHH

RRSSTT

SSYYNN

FFIINN

TCP/UDP headerTCP/UDP headerDADA SASA TFTF CRCCRCDataDataIPIP HeaderHeader

• Multiplexing/Demultiplexing:

– based on IP addresses,sender’s and receiver’sport numbers

TLP 9

Port Number Description01579

111315171920212325374243537779809395

101102103104111113117119129139

ReserveTCP M ultiplexerRemote Job EntryEchoDiscardActive UsersDaytimeNetwork status programQuote of the dayCharacter generatorFTP dataFTP commandTerminal ConnectionSM TPTimeHost Name ServerW ho isDomain Name ServerPrivate RJE serviceFingerHttp protocolDevice Control ProtocolSUPDUP ProtocolNIC host name serverIOS-TSAPX.400 mail serviceX.400 mail sendingSUN RPCAuthentication ServiceUUCP-path serviceUSENET news Transfer ProtocolPassword Generator ProtocolNETBIOS Session Service

• Source port numbers~ randomly assigned by the

sending host (1024< # <65536)• Destination port numbers

~ the well-known one or theincoming source port # (# <1024)

TCP Well-known Port Numbers

TLP 10

UDP Well-known Port Numbers

PortNum ber

Description

079

111315171937424353676869111123161162512513514525

ReserveEchoD iscardA ctive UsersD aytimeN etw ork status programQ uote of the dayCharacter G eneratorTimeH ost N ame ServerW ho isD omain Name ServerBootstrap Protocol ServerBootstrap Protocol ClientTrivial File Transfer (TFTP)Sun M icrosystems RPCN etw ork Time Protocol (NTP)SNM P net monitorSNM P trapsU NIX comsatU NIX rwho daemonSystem logTime daemon

TLP 11

Assigned, Registered and Dynamic Port NumbersAssigned, Registered and Dynamic Port Numbers

• RFC 1700.• FTP site: ftp://isi.edu./in-notes/iana/assignments.

– Up-to-date assignments of numbers

• Assigned port numbers range from 0 - 1023.– Assigned are reserved by IANA and cannot be used– Used for TCP, IP, UDP and various applications such

as TELNET

• Registered range for 1024 - 65535 and these are companies that have registered their application.

• Dynamic port numbers are also in the range of 1024 - 65535.

• RFC 1700.• FTP site: ftp://isi.edu./in-notes/iana/assignments.

– Up-to-date assignments of numbers

• Assigned port numbers range from 0 - 1023.– Assigned are reserved by IANA and cannot be used– Used for TCP, IP, UDP and various applications such

as TELNET

• Registered range for 1024 - 65535 and these are companies that have registered their application.

• Dynamic port numbers are also in the range of 1024 - 65535.

[ check with Unix/Linux files: /etc/services ] TLP 12

HowHow DemultiplexingDemultiplexing works ?works ?

• When host receives IP datagrams . . .

– each datagram has source IP address, destination IP address

– each datagram carries 1 transport-layer segment

– each segment has source, destination port number(recall: well-known port numbersfor specific applications)

• Host uses IP address & port numberto direct segment to the appropriatesocket (w.r.t. a process)

source port # dest port #

32 bits

applicationdata

(message)

other header fields

TCP/UDP segment format

TLP 13

Mux/DeMux (TCP): Example I

• One process to one connection

host A

applicationtransportnetwork

P’

Telnet client

server B

applicationtransportnetwork

P

Telnet server

src port: 5678dest. port: 23

source port:23dest. port:5678

TLP 14

Mux/DeMux (TCP) : Example II

• Multiple connection to multiple processes

Web clienthost C

Web clienthost A

transportnetwork

DeMUX

Web serverhost B

B’s IP = 140.124.13.3Well-known Port = 80

C’s IP: 140.112.234.2Dest IP: B

src port: 79767976dest. port: 80

C’s IP: 140.112.234.2Dest IP: B

src port: 8879dest. port: 80

Src IP: 140.124.70.13Dest IP: 140.124.13.3

source port: 8879dest. port: 80

P2’P1’

P1’

P2P1 P3

TLP 15

Mux/DeMux (TCP): Example I

• One process to one connection

host A

applicationtransportnetwork

P’

Telnet client

server B

applicationtransportnetwork

P

Telnet server

src port: 5678dest. port: 23

source port:23dest. port:5678

TLP 16

MUX/DeMUX Happened Everywhere

SMTP FTPTELNETPING

AP LayerDNSSNMP BOOTP NTP

Based onframe’s L/T

Based onprotocol type

Based onport #

TCP UTPTP LayerSegment

orDatagram

RARPARP

ICMPIP

IGMP

Internet Layer(S/W modules)

Packet(Daragram)

DATA LINK (e.g., Ethernet)

Medium (Frames)Ntwk Access Layer

Frame+

Bits

Demultiplexing

(from physical link)

(Interface-SAP)

Multiplexing

TRACEROUTE

TLP 17

UDP: User Datagram Protocol [RFC 768]

Why is there a UDP?Why is there a UDP?

no connection setup/ establishing time(whichcan add delay)

simple: no connection state at sender’s and receiver’s app

small segment header

• Low overhead

no congestion control:UDP can blast away as fast as desired (unregulated sending rate)

““no frills,no frills,”” ““bare bonesbare bones””Internet transport protocolInternet transport protocol

““best effortbest effort”” service, UDP service, UDP segments may be:segments may be:

lost

delivered out of order toapplications

connectionless:connectionless:

no handshaking between UDP sender, receiver

each UDP segment handledindependently of others

TLP 18

UDP Header and Segment FormatUDP Header and Segment Format

Often used for Often used for streamingstreamingmultimediamultimedia apps withapps with

loss tolerant

rate sensitive

Other UDP uses:Other UDP uses:

DNS

SNMP

Reliable transfer over UDP: Reliable transfer over UDP: add reliability at application add reliability at application layerlayer

application-specific errorrecover!

32 bits

source port # dest port #

Applicationdata

(message)

length checksumLength, in

bytes of UDPsegment,including

header

UDP segment format

TLP 19

UDP Checksum

Goal:Goal: detectdetect ““errorserrors”” (e.g., flipped bits) in transmitted segment

Sender:Sender:

treat segment contents as

sequence of 16-bit integers

checksum: addition (1’scomplement sum) of

segment contents

sender puts checksum value

into UDP checksum field

Receiver:Receiver:

compute checksum of

received segment

check if computed checksum

equals checksum field value:

NO - error detected

YES - no error detected.

But maybe errors But maybe errors nonethlessnonethless??

(e.g., flipped bits) in transmitted segment

See next slide for implementation detailspp.200-201 TLP 20

Checksum in the UDP HeaderChecksum in the UDP Header

• ChecKSum

( IP’s CKS with the differences of following )1. Allowing odd # of data byte (by padding one byte of “0”

but don’t transmit it)2. Including pseudo-header from IP header (12 bytes

counted in total)

• Goal : to verify that the UDP DG has reached its correct destination

• No CKS used if CKS = all 0’s being transmitted.

• Transmit 65535 if computed CKS = all 0’s (one’s complement)

• CKS adds pseudo hdr and UDP data (plus 8-bit 0’s if necessary)

SRC IP(4B), DEST IP(4B), 00 + Protocol (2B), UDP length(2B)

TLP 21

TCP: Overview (TCP: Overview (RFCsRFCs: 793, 1122, 1323, 2018, 2581): 793, 1122, 1323, 2018, 2581)

full duplex data:full duplex data:

bi-directional data flow in same connection

MSS: maximum segmentsize

connectionconnection--oriented:oriented:

handshaking (exchange of control msgs) init’ssender, receiver statebefore data exchange

flow controlled:flow controlled:

sender will not overwhelm receiver

pointpoint--toto--point:point:

one sender, one receiver

reliable, inreliable, in--orderorder byte steam:byte steam:

no “message boundaries”pipelined:pipelined:

TCP congestion and flow control set window size

send & receive bufferssend & receive buffers

socketdoor

TCPsend buffer

TCPreceive buffer

socketdoor

segment

applicationwrites data

applicationreads data

TLP 22

Principles of Reliable data transfer

Important issue in application, transport, and link layersImportant issue in application, transport, and link layers

Top of important networking topics!Top of important networking topics!

Being called when data arrives

Being called when pkt arrives

(details coming next)

Net

wo r

kla

yer

characteristics of a unreliable channel will determine thecharacteristics of a unreliable channel will determine thecomplexity of reliable data transfer (complexity of reliable data transfer (rdtrdt) protocol.) protocol.udtudt ~ unreliable data transfer protocol (IP, here)~ unreliable data transfer protocol (IP, here)

TLP 23

Reliable data transfer: getting started

rdt_send(): called from above,(e.g., by app.). Passed data to

deliver to receiver upper layer

udt_send(): called by rdt,to transfer packet over

unreliable channel to receiver

rdt_rcv(): called when packetarrives on rcv-side of channel

deliver_data(): called by rdt to deliver data to upper

sendside

receiveside

TLP 24

IP contradicts TCP ?IP contradicts TCP ?

• TCP provides completely reliable transfer• (But) IP offers best-effort (unreliable) delivery• TCP uses IP ? (YES ) How does it be done ?

Reliable Data Transmission rely on . . .- Positive acknowledgmentPositive acknowledgment

~ Receiver returns a short message (called ACK,acknowledgement) to the sender when data arrives

- Retransmission (upon timeout)Retransmission (upon timeout)~ Sender starts timer whenever a segment is transmitted~ If timer expires before acknowledgment arrives,

sender retransmits THE message

• Recall: C.O.

C.L.

TLP 25

TCP Header - I

Headlength receiver window size

• TCP packed data in “segment” but counting/tracking by bytes.

• Seq# and Ack#: Counting by bytes of data (not segments)!TLP 26

TCP Header – II

• Sequence number (SEQ # ) :- identifies each byte in the stream of data from the sending TCP to the receiving TCP (byte streams)

- numbering ranging from 0 to 232 -1 and wrapping backaround to 0

- SEQ # = (so-called) initial SEQ # (ISN) when SYN = 1(the first (data) segment = ISN + 1)

• Acknowledgment number :- the next sequence number that the receiver expects to

receive (i.e., the piggybacked ACK)

= the SEQ # of the last successfully received data byte + 1

(ACK 1 when the connection is firstly established)

(flag)

SQN is bounded to octets rather than to entire segments.

TLP 27

TCP Header – III

• Data Offset = header length (HL) in 32-bit word, (60 bytes max)

• Code bits :

- URG “urgent pointer” field is valid (when it is set to 1)

- ACK Making ACK number valid (when it is set to 1)

- PSH sender should send out all data in the sending buffer

receiver should pass this data to an application ASAP

- RST reset the connection (port unreachable)

- SYN synchronize sequence numbers to initiate a connection

- FIN sender is finished sending data (ask to close connection)

• Window (for credit allocation flow control) :indicating the number of bytes the sender is willing to accept

con

n. m

anag

emen

t

TLP 28

TCP Connection Establishment

• Establishing a connection between two ends before exchanging data

• Connection establishing protocol ~ a threethree--way handshakingway handshaking

client server

(SYN = 1, Seq# = j)(Active open)SYN_SENTSYN_SENT

Listen (passive open)SYN = j = ISN

ACK = k+1

SYN = k, ACK = j+1ISNOpen a conn.

||Open a socket

SYN_RCVDSYN_RCVD( k ~ Rxer’s seq # )

initialize TCP variables:seq. #, buffers, flowcontrol info (e.g.RcvWindow)

ConnectionEstablishedEstablished

EstablishedEstablished

- SYN consumes one sequence number- ISN should change over time (differs from connection to connection )

TLP 29

DecomposeDecompose PDUsPDUs in a TCP/IP Scenarioin a TCP/IP Scenario

Windows> telnet 140.124.70.26 (showing the first two packets sending by the client)

Protocol #: Network--Transport layer

TLP 30

Src port # (randomly generated by the src PC – 1059, here)Dest port # (an well-known for well-known application)

Port #: Transport--Application layer

(PDU cont’d)

(Selective ACKnowledgment)(for reliable, in-order reception)

- see next pages

TLP 31

StopStop--andand--Wait ProtocolWait Protocol

RTT

last packet bit transmitted, t = L / R

ACK arrives, send nextpacket, t = RTT + L / R

• Sends one segment and

waits for Ack returning

before continuing

sending the next segment

(performance)

time

pipe

receiversender

first packet bit transmitted, t = 0

first packet bit arriveslast packet bit arrives, send ACK

(assuming no error)

TLP 32

Performance of StopPerformance of Stop--andand--Wait ProtocolWait Protocol(rdt3.0 – Alternating-bit protocol, textbook)

rdt3.0 works, but performance stinksrdt3.0 works, but performance stinksPerformance issue: Performance issue: Example: 1 Example: 1 GbpsGbps link, 15 ms elink, 15 ms e--e prop. delay, 1KB packet:e prop. delay, 1KB packet:

Ttransmit = 8kb/pkt10**9 b/sec = 8 microsec

(channel capacity)

(Packet size)

• Sender/channel Utilization

Utilization = U = =8 usec

30.008 msfraction of time

sender busy sendingBits into the channel

= 0.00027

(15.008 x 2, if ACK ignored)(Sender) (or 0.027%)

(ref. P.214)Send 1KB pkt every 30.008 mseceffective throughput only 267 kbps over 1 Gbps linknetwork protocol limits use of physical resources a lot!

TLP 33

Pipelined protocolsPipelined protocols (Why need ?)

Pipelining :Pipelining : allowignallowign sender to send multiple, sender to send multiple, ““inin--flightflight””,,yetyet--toto--bebe-- acknowledgedacknowledged pktspkts w/o waiting for w/o waiting for ACKsACKs

For reliable data transfer :For reliable data transfer :the range of sequence numbers must be increased (not the range of sequence numbers must be increased (not retxretx.).)need to buffer more than one packet at sender and/or receiverneed to buffer more than one packet at sender and/or receiver

Two generic forms of pipelined protocols:Two generic forms of pipelined protocols:

go-Back-N and Selective repeat

filling a pipeline

• Seq.# range and buffering depend on the manner in which a data transfer protocol responds to lost, corrupted, and overly delayed packets.

(pipelined with error recovery) TLP 34

Pipelining:Pipelining: increaseingincreaseing utilizationutilization

RTT

last bit transmitted, t = L / R

ACK arrives, send nextpacket, t = RTT + L / R

last bit of 3rd packet arrives, send ACK

receiversender

first packet bit transmitted, t = 0 (assuming no error)

first packet bit arriveslast packet bit arrives, send ACKlast bit of 2nd packet arrives, send ACK

(next cycle begins)

Increase utilizationby a factor of 3!

Usender=

.02430.008

= 0.00083 * L / RRTT + L / R

= 0.00027(0.08%)

TLP 35

Go-Back-NSenderSender ::

kk--bitbit seqseq # in # in pktpkt headerheader

““windowwindow”” of up to N, consecutiveof up to N, consecutive unAckunAck’’eded pktspkts allowed (the window size)allowed (the window size)

Preview : sliding window

ACK(n):ACK(n): ACKsACKs allall pktspkts up to, including up to, including seqseq # n ~ # n ~ ““cumulativecumulativeACKACK”” (Advantage: see Fig. 3.34)

may deceive duplicate ACKs (see receiver) ?? You find it out.

Set timer for each inSet timer for each in--flightflight pktpkt

timeout(n):timeout(n): retransmitretransmit pktpkt n and all higher n and all higher seqseq ## pktspkts in windowin windowTLP 36

GBN (Cont’d)

Receiver :Receiver :

ACKACK--only: always send ACK for correctlyonly: always send ACK for correctly--receivedreceived

pktpkt with highest with highest inin--orderorder seqseq ##

may generate duplicate ACKs

need only remember expected seqnum

outout--ofof--order packet: order packet:

discard (don’t buffer) no receiver buffering

ACK pkt with highest in-order seq #

TLP 37

GBN in action

discarddiscard

discarddiscard

discarddiscard

reTx

TLP 38

Selective Repeat/Selective Repeat/RejecctRejecct

ReceiverReceiver individuallyindividually acknowledges all correctlyacknowledges all correctlyreceivedreceived pktspkts

buffers pkts, as needed, for eventual in-order delivery to upper layer

Sender only resends Sender only resends pktspkts for which ACK not receivedfor which ACK not received

sender timer for each unACKed pkt

Sender windowSender window

N consecutive seq #’sagain limits seq #s of sent and unACKed pkts

TLP 39

Selective repeat: sender, receiver windows

(Read: Fig. 3.23-25 for Sender’s and receiver’s events and actions) TLP 40

Selective Repeat in action

Window size = 4

loss

TLP 41

Selective Repeat: a dilemmaSelective Repeat: a dilemma

ExampleExample::

seq #’s: 0, 1, 2, 3 (size = 4)

window size = 3 < Max seq #

Receiver sees no differencein both scenarios (a) and (b).

Incorrectly passes duplicate

data as new in case (a)

Q:Q: To prevent this ambiguity,To prevent this ambiguity,what should be the what should be the relationship betweenrelationship between seqseq ##size and window size?size and window size?

A: sequence # space >= 2*window

(duplicate pkt 0)

(new pkt 0)

?

(Sec. 3.4.4)

||ReTx the 1st pkt

Tx 5th pkt

Invisiblecurtain

00

00

ReTx the 1st pkt

Tx 5th pkt(Problem 3.18)

InternetInternet

TLP 42

Connection Maintenance (Ex: Telnet Scenario)Connection Maintenance (Ex: Telnet Scenario)an interactive application

Q:Q: How receiver handles How receiver handles outout--ofof--order segments ?order segments ?

AA: TCP spec doesn: TCP spec doesn’’t say, t say, ~ up to ~ up to implementorimplementor

(go(go--backback--N or Selective Repeat)N or Selective Repeat)

Host A Host B

Seq=42, ACK=79, data = ‘C’

Seq=79, ACK=43, data = ‘C’

Seq=43, ACK=80, . . .

Usertypes

‘C’

host ACKsreceipt

of echoed‘C’

host ACKsreceipt of‘C’, echoes

back ‘C’

time

• "echo back"ensure that characters

seen by Telnet user have already

been received and processed at

remote site.

• Each character traverses the

network twice

Segment exchange

TLP 43

TCP Connection Termination

(Active close)FIN_WAIT_1

FIN_WAIT_2

CLOSE_WAIT

Listen (passive close)FIN = M(FIN=1& SYN=M)

ACK = N+1

ACK = M+1

CLOSED

FIN = N

TIME_WAIT

LAST_ACK

(2 MSL wait state)

• MSL = Max Segment Lifetime; MSL in RFC 793 = 2 min, max.

Close a conn.||

close a socket

Timed wait

CLOSED

(closing)

Resources at both C and S are deallocated.

(ACK=1& SYN=M+1)

Resend ACKin case it lost

(if ACK rxed)

client server

• Connection termination protocol 3-way but taking four segments TLP 44

TCP Connection ManagementTCP Connection Management

TCP client lifecycle

TCP server lifecycle

TLP 45

TCP: retransmission scenarios

lost ACK scenario premature timeout, cumulative ACKs

Host A

Seq=100, 20 bytes data

ACK=100

Seq=

92 t

imeo

ut

Host B

Seq=92, 8 bytes data

ACK=120

Seq=92, 8 bytes data

Seq=

100

tim

eout

ACK=120New timeoutfor seq.=92

Host A

Seq=92, 8 bytes data

ACK=100

loss

tim

eout

time

Host B

X

Seq=92, 8 bytes data

ACK=100

Duplicated. Host B’s action?

time

TLP 46

TCP Flow Control

receiver: explicitly informssender of (dynamicallychanging) amount of freebuffer space - RcvWindow field in

TCP segmentsender: limits the amount of

transmitted, unACKeddata less than most recently received RcvWindow

- guarantees receive buffer doesn’t overflow

sender won’t overrunreceiver’s buffers by

transmitting too much,too fast

flow control

RcvBuffer = size or TCP Receive BufferRcvWindow = amount of spare room in Buffer

❒ spare room in buffer= RcvWindow

= RcvBuffer-[LastByteRcvd - LastByteRead]

TLP 47

Flow Control - Sliding Window

• To improve the utilization of the channel in the cases of Tprop > Tframe

by allowing multiple frames to be transmited before receiving ACK(s) (to improve the performance of the stop-and-wait mechanism)

• To keep track of which frames without waiting for any ACKed, each frame is labeled with sequence number.

• Rule of sliding window:

- Txer maintains a list of SEQ numbers that it is allowed to send

- Rxer maintains a list of SEQ numbers that it is prepared to receive

- Frames are numbered (0 ~ 2K-1) modulo 2K , k = # of bits in SEQ #

- The window size 2K , and the SEQ # has a bounded size since it occupies a field in the frame

- Sender must buffer these frames in case they need to be retransmitted

• Applied to Go-back-N and Selective-reject ARQ, and LLC, HDLC, and X.25

Window of frames

(?)

TLP 48

Sliding window flow control (cont’d)

ACK

ACKRR ~ Receiver Ready (in HDLC)

Example

Back to GBN

TLP 49

TCP Flow Control - Credit Allocation

• Operation:- Sending TCP includes a SEQ # of the first byte in the

segment field - Receiving TCP ACKs an incoming segment with (A=i, W=j),

whereA=i expecting SEQ = i and all SEQ prior to i are ACKedW=j granting of permission to send additional j (window)

bytes, i.e., corresponding to SEQ # in i ~ (i+j-1)

• Some examples of granting credit:

Assuming Rxer just issued (A=i, W=j )]

- Rxer issues (A=i, W=k) to increase credit to k (k > j) when no additional data have arrived

- Rxer issues (A=i+m, W=j-m) without granting additionalcredit to ACK an incoming segment containing m bytes (m < j)

TLP 50

ExampleExample

(granted permission)

Remaining credits

- sending 200 bytes/segment; sending and receiving SEQ# are synchronized through connection establishment

- initial credit = 1400 bytes, and SEQ # = 1001

+ 600

A=1001, W=1400

TLP 51

TCP Round Trip Time and Timeout

Q: How to estimate RTT?SampleRTT: measured time from segment transmission until ACK receipt, ignoreretransmissions and cumulatively ACKedsegmentsSampleRTT will vary, wantestimated RTT “smoother”

average several recent measurements, not justcurrent SampleRTT

Q: How to set TCP timeout value?longer than RTT

note: RTT will vary

too short:premature timeout,

unnecessaryretransmissionstoo long: slowreaction tosegment loss (which is unnecessary)

TLP 52

EstimatEstimation ofion of RTTRTT

- Exponential weighted moving average (why?)

- influence of given sample decreases exponentially fast

- typical value of = 0.125 (RFC 2988)

EstimatedRTT = (1- )*EstimatedRTT + *SampleRTT

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconds)

RTT

(milli

seco

nds)

RTT: from gaia.cs.umass.edu to fantasia.eurecom.fr

Sample RTT Estimated RTT

TLP 53

RTO (Retransmission Time Out)RTO (Retransmission Time Out)

Setting the timeout❒ EstimtedRTT plus “safety margin”

❍ large variation in EstimatedRTT -> larger safety margin❒ First estimate of how much SampleRTT deviates from

EstimatedRTT:

TimeoutInterval(RTO)= EstimatedRTT + 4*DevRTT

DevRTT = (1- )*DevRTT +

*|SampleRTT - EstimatedRTT|

(typically, = 0.25)

Then set timeout interval:

TLP 54

Principles of Congestion ControlPrinciples of Congestion Control

Congestion:informally: “too many sources sending too much data too fast for network to handle”different from flow control (w.r.t. receiver)Manifestations:

lost packets (buffer overflow at routers)long delays (queueing in router buffers)

a top-10 problem!

TLP 55

Approaches towards congestion controlApproaches towards congestion control

Two broad approaches towards congestion control:

2. Network-assistedcongestion control:

❒ routers provide feedbackto end systems❍ single bit indicating

congestion (SNA, DECbit, TCP/IP ECN(RFC2481), ATM)

❍ explicit rate sendershould send at

1. End-end congestion control:

❒ no explicit feedback from network

❒ congestion inferredfrom end-system observed loss, delay

❒ approach taken by TCP

Ref: Sec 3.6.2~3.6.3

(Ex: choke packet in PSN)TLP 56

Case study: ATM ABR congestion control

ABR: available bit rate

❒ “elastic service”❒ if sender’s path

“underloaded”:❍ sender should use

available bandwidth❒ if sender’s path

congested:❍ sender throttled to

minimum guaranteed rate

RM (resource management)cells:

❒ sent by sender, interspersedwith data cells

❒ Two bits in RM cell set by switches (i.e., “network-assisted”)❍ NI bit: No Increase in rate

(mild congestion)❍ CI bit: Congestion Indication

❒ RM cells returned to sender by receiver, with bits intact

TLP 57

TCP Congestion Control : OverviewTCP Congestion Control : Overview

Important variables:- Congwin:

~ congestionwindow size

- threshold:~ defines threshold between two slowstart phase and congestion control phase

“probing” for usable bandwidth:- ideally: transmit as fast

as possible (Congwin aslarge as possible) without loss

- increase Congwin untilloss (congestion)

- loss happened: decreaseCongwin, then begin probing (increasing)again

, ? • Congwin cwnd• threshold ssthresh

Abbreviations:

TLP 58

TCP Congestion Control (cont’d)

❒ end-end control (no networkassistance)

❒ sender limits transmission:LastByteSent-LastByteAcked

CongWin❒ Roughly,

❒ CongWin is dynamic, functionof perceived networkcongestion

How does sender perceive congestion?

❒ loss event = timeout or 3duplicate acks

❒ TCP sender reduces rate (CongWin) after loss event

Three mechanisms:❍ AIMD❍ Slow start❍ Congestion Avoidance

rate = CongWinRTT Bytes/sec

TLP 59

I. TCP AIMD Congestion ControlI. TCP AIMD Congestion Control

❒ Additive Increase:~ increase CongWin

by 1 MSS every RTT in the absenceof loss events: probing

❒ Multiplicative Decrease:~ cut CongWin in half

after loss event

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow AIMD Operation

TLP 60

II. Slow StartII. Slow Start

When connection begins, increase rate exponentially fast untilfirst loss event.

• Operation:- Initializing cwnd = 1 (1 MSS) whenever opening a new connection- Increasing cwnd by 1 (up to a Max) every time an ACK is received- At any time, TCP measures the congestion window in segment

and restrains the transmission by

awnd = Min { credit, cwnd }

awnd = allowed window (currently allowed to send w/o receiving ACKs)cwnd = congestion window (used at startup and reduced during congestion)credit = receiver advertised window (used to calculate window/segment size)

• Slow start probes the internet to make sure not to send too manysegments into an already congested network

• Connection’s data flow is controlled by the incoming ACK (not cwnd)

TLP 61

Slow Start Operation

- A is sending 100-bytesegments

- A can fill the pipe with a continuous flow of segmentsafter approximately FOURRTTs

• Slow start may be a misnomer since cwndgrows exponentially(pretty much close to)

Initializationa new connection

Really slow ?

SN = 1

ACK = 1011st

RTT

1st RTT

SN = 101

SN = 201

ACK = 201

2nd RTT

ACK = 801

SN = 701

3rd RTT

ACK = 1501

SN = 1401

4th RTT TLP 62

III. Congestion Avoidance

• Also, Dynamic Window sizing on Congestion (Jacobson [88/95])~ modified the growth of cwnd from exponential to linear~ a way to deal with the segment loss :

a timeout occurring and receipt of duplicate ACKs

III. Congestion Avoidance

• Operation:- Begin with slow start algorithm until a congestion occurs :- Set ssthresh (a slow start threshold) = cwnd/2

- Set cwnd = 1 and perform slow start process

(i.e., increase cwnd by 1 for every ACK received)

until cwnd = ssthresh- For cwnd ssthresh, increase cwnd by one for each round-trip

time (RTT)

TLP 63

Slow start, endingwith a timeout ssthresh = 8

- check how longit would take torecover the cwndlevel beforecongestion ?

6th

RTT1st

RTT

cwnd = 9

• ExampleExample Congestion avoidance (cont’d)

Slow start, endingwith a timeout

counted asONE more RTT)

Exponentialgrowth of cwnd

Lineargrowth of cwnd

TLP 64

Comparison of Slow Start andCongestion Avoidance

1

Exponentialgrowth of cwnd

Lineargrowth of cwnd8

9

ssthresh

(RTT)

(See what the texkbook says.)

TLP 65

TCP Slow Start AlgoTCP Slow Start Algorithmrithm

initialize: Congwin = 1for (each segment ACKed)

Congwin++until (loss event OR

CongWin > threshold)

Slowstart algorithm

Host A

one segment

RTT

Host B

time

two segments

four segments• exponential increase (per RTT) in window size (not so slow!)

• loss event timeout(Tahoe TCP) and/or or three duplicate ACKs(Reno TCP)

TLP 66

TCP Congestion Avoidance : TahoeTCP Congestion Avoidance : Tahoe

/* slowstart is over */ /* Congwin > threshold */Until (loss event) {every w segments ACKed:

Congwin++}

threshold = Congwin/2Congwin = 1perform slowstart

TCP Tahoe Congestion avoidance

TLP 67

TCP Congestion Avoidance : RenoTCP Congestion Avoidance : Reno

/* slowstart is over */ /* Congwin > threshold */Until (loss event) {

every w segments ACKed:Congwin++

}threshold = Congwin/2If (loss detected by timeout) {

Congwin = 1Threshold = Congwin/2perform slowstart }

If (loss detected by tripleduplicate ACK) {

Congwin = Congwin/2,Congwin increases linearly }

• Three duplicate ACKs(Reno TCP):

• Some segments are getting through correctly!

• Don’t “overreact” bydecreasing window to 1 as in Tahoe– decrease window

size by half

TCP Reno Congestion avoidance

3 dup ACKs indicatesnetwork capable of delivering some segments

TLP 68

TCP Reno versus TCP Tahoe:

0

2

4

6

8

10

12

14

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Transmission round

con

ges

tio

n w

ind

ow

size

(seg

men

ts)

threshold(variable)

Series1 Series2TCPTahoe

TCPReno

Fig. 3-51 Evolution of TCP Congestion window (Tahoe and Reno)

TLP 69

TCP FairnessTCP Fairness

• TCP Fairness goal:

~ if K TCP connections pass through a router (share samebottleneck link), each TCP should get R/K of link capacity

Example: TCP connection 1

bottleneckRouter capacity R

TCPconnection 2

❒ Example: link of rate R supporting 9 connections; ❍ What if :new app asks for 1 TCP, gets rate R/10❍ What if new app asks

(A: R/2) for 11 TCPs, gets what ?

TLP 70

(Joined) Throughput Realized by Two TCPs

• Two competing sessions:– Additive increase gives slope of 1, as throughout increases

– multiplicative decrease decreases throughput proportionally

How TCP approaches fairness ?

R equal bandwidth share

Conn

ecti

on2

thro

ugh p

ut

congestion avoidance: additive increaseloss: decrease window by factor of 2

congestion avoidance: additive increaseloss: decrease window by factor of 2

Goal : having achieved throughput fallsomewhere around intersection

Assuming starting

Connection 1 throughput R

TLP 71

The End

Understanding the Computer

TLP 72