Transport Layer3-1 Transport Layer Outline r 3.1 Transport-layer services r 3.2 Multiplexing and...

14
Tree-trimming Standards and Practices City Council Work Session November 19, 2007

Transcript of Transport Layer3-1 Transport Layer Outline r 3.1 Transport-layer services r 3.2 Multiplexing and...

Transport Layer 3-1

Transport Layer Outline

3.1 Transport-layer services

3.2 Multiplexing and demultiplexing

3.3 Connectionless transport: UDP

3.4 Principles of reliable data transfer

3.5 Connection-oriented transport: TCP segment structure reliable data transfer flow control connection

management

3.6 Principles of congestion control

3.7 TCP congestion control

Transport Layer 3-2

Recap: rdt3.0 sender (Stop-and-wait)

sndpkt = make_pkt(0, data, checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) && ( corrupt(rcvpkt) ||isACK(rcvpkt,1) )

Wait for call 1 from

above

sndpkt = make_pkt(1, data, checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && isACK(rcvpkt,0)

rdt_rcv(rcvpkt) && ( corrupt(rcvpkt) ||isACK(rcvpkt,0) )

rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && isACK(rcvpkt,1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

rdt_rcv(rcvpkt)

Transport Layer 3-3

Recap: rdt3.0: stop&wait op

first packet bit transmitted, t = 0

sender receiver

RTT

last packet bit transmitted, t = L / R

first packet bit arriveslast packet bit arrives, send ACK

ACK arrives, send next packet, t = RTT + L / R

U sender

= .008

30.008 = 0.00027

microseconds

L / R

RTT + L / R =

Transport Layer 3-4

Recap: Pipelining: increased utilization

first packet bit transmitted, t = 0

sender receiver

RTT

last bit transmitted, t = L / R

first packet bit arriveslast packet bit arrives, send ACK

ACK arrives, send next packet, t = RTT + L / R

last bit of 2nd packet arrives, send ACKlast bit of 3rd packet arrives, send ACK

U sender

= .024

30.008 = 0.0008

microseconds

3 * L / R

RTT + L / R =

Increase utilizationby a factor of 3!

Transport Layer 3-5

Recap: GBN for Pipelined Error RecoverySender: There is a k-bit sequence # in packet header “window” of up to N, consecutive unacknowledged sent/can-be-sent packets allowed window moves by 1 packet at a time when its 1st sent pkt is acknowledged (standard behavior)

Sender must respond to three types of events: 1- Invocation from above: application layers tries to send a packet, if window is full

then packet is returned otherwise the packet is accepted and sent. 2- Receipt of an ACK: One ACK(n) received indicates that all pkts up to, including seq

# n have been received - “cumulative ACK” may receive duplicate ACKs (when receiver receives out-of-order packets)

3- A timeout event (only cause of retransmission): timer for each in-flight pkt. if timeout occurs: retransmit packets that have not been acknowledged.

window cannot contain acknowledged pkts

Transport Layer 3-6

Recap: Selective repeat for error recovery

Window may contain acknowledged pkts (unlike GBN)

Transport Layer 3-7

TCP: Overview RFCs: 793, 1122, 1323, 2018, 2581

full duplex data: bi-directional data flow in same

connection at the same time

flow controlled: sender will not overwhelm

receiver

point-to-point: one sender, one receiver no one to many multicasts

connection-oriented: processes must handshake before

sending data three-way handshake: (exchange

of control msgs) initializes sender, receiver state before data exchange

pipelined: TCP congestion and flow control

set window size

send & receive buffers: set-aside during the 3-way

handshaking

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm en t

applicationwrites data

applicationreads data

Transport Layer 3-8

TCP: Overview - cont Maximum Segment Size (MSS):

Defined as the maximum amount of application-layer data in the TCP segment.

TCP grabs data in chunks from the send buffer where the maximum chunk size is called MSS. TCP segment contains TCP header and MSS.

MSS is set by determining the largest link layer frame (Maximum Transmission Unit or MTU) that can be sent by the local host

MSS is set so that an MSS put into an IP datagram will fit into a single link layer frame. Common values of MTU is 1460 bytes, 536 bytes and 512 bytes.

TCP sequence #s: both sides randomly choose initial seq #s (other than 0) to prevent

receiving segments of older connections that were using the same ports. TCP views data as unordered structured stream of bytes so seq #s are over

the stream of byes. file size of 500,000 bytes and MSS=1,000 bytes, segment seq #s are: 0,

1000, 2000, etc. TCP acknowledgement #s:

uses cumulative acks: TCP only acks bytes up to the first missing byte in the stream . TCP RFCs do not address how to handle out-of-order segments.

ACK # field has the next byte offset that the sender or receiver is expecting

Transport Layer 3-9

TCP segment structure

source port # dest port #

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urgent data pointerchecksum

FSRPAUheaderlength

notused

Options (variable length)used to negotiate MSS

URG: urgent data (generally not used)

ACK: ACK #valid

PSH: push data nowto upper layer

SYN/FIN: connection setup and close.

RST=1: used in responsewhen client

tries to connect to a non-open server port .

16-bit= # bytes receiver willingto accept (RcvWindow size)

counting by bytes of data (not segments!)largest file that can be sent = 232 (4GB)total #segments= filesize/MSS

Internetchecksum

(as in UDP)

header-length = 4-bitsin 32-bit words

Transport Layer 3-10

Seq Numbers and Ack Numbers Suppose a data stream of size 500,000 bytes,

MSS is 1,000 bytes; the first byte of the data stream is numbered zero. Seq number of the segments:

• 1st seg: 0; 2nd seg: 1000; 3rd seg: 2000, …

Ack number: Assume host A is sending seg to host B. Because TCP

is full-duplex, A may be receiving data from B simultaneously.

Ack number that host B puts in its seg is the seq number of the next byte B is expecting from A

• B has received all bytes numbered 0 through 535 from A. If B is about to send a segment to host A. The ack number in its segment should 536

Transport Layer 3-11

TCP seq. #’s and ACKs - Telnet example

Telnet uses “echo back” to ensure characters seen by user already been received and processed at server.

Assume starting seq #s are 42 and 79 for client and server respectively.

After connection is established, client is waiting for byte 79 and server for byte 42.

Seq. #’s: byte stream “number” of

first byte in segment’s data

ACKs: seq # of next byte

expected from other side cumulative ACK

Host Aclient

Host Bserver

Seq=42, ACK=79, data = ‘C’

Seq=79, ACK=43, data = ‘C’

Seq=43, ACK=80

Usertypes

‘C’

host ACKsreceipt

of echoed‘C’

host ACKsreceipt of

‘C’, echoesback ‘C’

timesimple telnet scenario

Transport Layer 3-12

TCP Round Trip Time and TimeoutQ: how to set TCP

timeout value ? (timer management)

based on RTT longer than RTT

but RTT varies too short: premature

timeout unnecessary

retransmissions too long: slow reaction

to segment loss

Q: how to estimate RTT? SampleRTT: measured time from

segment transmission (handing the segment to IP) until ACK receipt ignore retransmissions (why?)

SampleRTT will vary from segment to segment, want estimated RTT “smoother” average several recent

measurements, not just current SampleRTT

TCP maintains an average called EstimatedRTT to use it to calculate the timeout value

Transport Layer 3-13

TCP Round Trip Time (RTT) and Timeout

EstimatedRTT = (1- ) * priorEstimatedRTT + * currentSampleRTT

Exponential Weighted Moving Average (EWMA) Puts more weight on recent samples rather than old ones influence of past sample decreases exponentially fast typical value: = 0.125 Formula becomes:

EstimatedRTT = 0.875 * priorEstimatedRTT + 0.125 * currentSampleRTT

Why TCP ignores retransmissions when calculating SampleRTT:Suppose source sends packet P1, the timer for P1 expires, and the source then sends P2, a new copy of the same packet. Further suppose the source measures SampleRTT for P2 (the retransmitted packet) and that shortly after transmitting P2 an acknowledgment for P1 arrives. The source will mistakenly take this acknowledgment as an acknowledgment for P2 and calculate an incorrect value of SampleRTT.

Transport Layer 3-14

RTT Sample Ambiguity

Karn’s RTT Estimator If a segment has been retransmitted:

• Don’t count RTT sample on ACKs for this segment• Keep backed off time-out for next packet• Reuse RTT estimate only after one successful transmission

A B

ACK

SampleRTT

Original transmission

retransmission

Estimate RTT

A B

Original transmission

retransmissionSampleRTT

ACKeRTTX

Transport Layer 3-15

Example RTT estimation:RTT: gaia.cs.umass.edu to fantasia.eurecom.fr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

Transport Layer 3-16

TCP Round Trip Time and TimeoutSetting the timeout EstimtedRTT plus “safety margin”

large variation in EstimatedRTT -> larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT:

TimeoutInterval = EstimatedRTT + 4*DevRTT

DevRTT = (1-)*DevRTT + *|SampleRTT-EstimatedRTT|

(typically, = 0.25)

Then set timeout interval:

Transport Layer 3-17

TCP: conn-oriented transport segment structure RTT Estimation and Timeout reliable data transfer flow control connection management

Transport Layer 3-18

TCP reliable data transfer

TCP creates rdt service on top of IP’s unreliable service

Pipelined segments Cumulative acks TCP uses single

retransmission timer as multiple timers require considerable overhead

Retransmissions are triggered by: timeout events duplicate acks

Initially consider simplified TCP sender: ignore duplicate acks ignore flow control,

congestion control

Transport Layer 3-19

TCP sender events:data rcvd from app: Create segment with seq

# seq # is byte-stream

number of first data byte in segment

start timer if not already running for some other segment (think of timer as for oldest unacknowledged segment)

expiration interval: TimeOutInterval

timeout: retransmit segment that

caused timeout restart timer Ack rcvd: a valid ACK field

(cumulative ACK) acknowledges previously unacknowledged segments: update expected ACK # restart timer if there are

currently unacknowledged segments

Transport Layer 3-20

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) { switch(event)

event: data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event: timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event: ACK received, with ACK field value of y if (y > SendBase) { SendBase = y if (there are currently not-yet-acknowledged segments) start timer }

} /* end of loop forever */

Comment:• SendBase-1: last cumulatively ack’ed byteExample:• SendBase-1 = 71;y= 73, so the rcvrwants 73+ ;y > SendBase, sothat new data is acked

Transport Layer 3-21

TCP: retransmission scenarios

Host A

Seq=100, 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92, 8 bytes data

ACK=120

Seq=92, 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92, 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92, 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

transmit not-yet-ack segment with smallest seq #

Transport Layer 3-22

TCP retransmission scenarios (more)

Host A

Seq=92, 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100, 20 bytes data

ACK=120

time

SendBase= 120

Doubling the timeout value technique is used in TCP implementations. The timeout value is doubled for every retransmission since the timeout could have occurred because the network is congested. (the intervals grow exponentially after each retransmission and reset after either of the two other events)

Transport Layer 3-23

TCP ACK generation policy [RFC 1122, RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq #. All data up toexpected seq # already ACKed

Arrival of in-order segment withexpected seq #. One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq. # .Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK. Wait up to 500msfor next segment. If no next segment,send ACK

Immediately send single cumulative ACK, ACKing both in-order segments

Immediately send duplicate ACK, indicating seq. # of next expected byte

Immediate send ACK, provided thatsegment starts at lower end of gap

leaves buffering of out-of-order segments open

Transport Layer 3-24

Fast Retransmit

Time-out period often relatively long: long delay before

resending lost packet Detect lost segments via

duplicate ACKs. Dup Ack is an ack that

reaknolwedges the receipt of an acknowledged segment

Sender often sends many segments back-to-back

If segment is lost, there will likely be many duplicate ACKs.

If sender receives 3 ACKs for the same data, it supposes that segment after last ACKed segment was lost: sender performs fast

retransmit: resend segment before that segment’s timer expires

algorithm comes as a result of 15 years TCP experience !

Transport Layer 3-25

event: ACK received, with ACK field value of y if (y > SendBase) { SendBase = y if (there are currently not-yet-acknowledged segments) start timer } else { increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) { resend segment with sequence number y }

Fast retransmit algorithm:

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 3-26

Is TCP a GBN or SR protocol ?

TCP can buffer out-of-order segments (like SR). TCP has a proposed RFC called selective

acknowledgement to selectively acknowledge out-of-order segments and save on retransmissions (like SR).

TCP sender need only maintain smallest seq # of a transmitted but unacknowledged byte and the seq # of next byte to be sent (like GBN).

TCP is hybrid between GBN and SR.

Transport Layer 3-27

TCP: conn-oriented transport segment structure RTT Estimation and Timeout reliable data transfer flow control connection management

Transport Layer 3-28

TCP Flow Control

receive side of TCP connection has a receive buffer:

speed-matching service: matching the send rate to the receiving app’s drain rate app process may be

slow at reading from buffer

sender won’t overflow

receiver’s buffer bytransmitting too

much, too fast

flow control

Transport Layer 3-29

TCP Flow control: how it works

(Suppose TCP receiver discards out-of-order segments)

sender maintains variable called receive window

spare room in buffer = RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead] TCP is not allowed to overflow the

allocated buffer (LastByteRcvd - LastByteRead <= RcvBuffer)

Rcvr advertises spare room by including value of RcvWindow in segments

RcvWindow = RcvBuffer at the start of transmission

Sender limits unACKed data to RcvWindow

sender keeps track of UnAcked data size = (LastByteSent - LastByteAcked)

UnAcked data size <= RcvWindow

When Receiver RcvWindow = 0, Sender does not block but rather sends 1 byte segments that are acked by receiver until RcvWindow becomes bigger.

Transport Layer 3-30

TCP: conn-oriented transport segment structure RTT Estimation and Timeout reliable data transfer flow control connection management

Transport Layer 3-31

Recap: TCP socket interaction

Server (running on hostid)

wait for incomingconnection requestconnectionSocket =welcomeSocket.accept()

create socket,port=x, forincoming request:welcomeSocket =

ServerSocket()

create socket,connect to hostid, port=xclientSocket =

Socket()

closeconnectionSocket

read reply fromclientSocket

closeclientSocket

Client

send request usingclientSocketread request from

connectionSocket

write reply toconnectionSocket

TCP connection setup

Transport Layer 3-32

TCP Connection Management

Recall: TCP sender, receiver establish “connection” before exchanging data segments

initialize TCP variables: seq. #s buffers, flow control info

(e.g. RcvWindow) client: connection initiator Socket clientSocket = new

Socket("hostname","port

number"); server: contacted by client Socket connectionSocket =

welcomeSocket.accept();

source port # dest port #

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urgent data pointerchecksum

FSRPAUheaderlength

notused

Options (variable length)used to negotiate MSS

Transport Layer 3-33

TCP Connection Management - connecting

client

SYN=1, seq=client_isn

server

SYN=1, seq=server_isn,

ack=client_isn+1

SYN=0, seq=client_isn+1, ack=server_isn+1

connrequest

Time

conngranted

ACK

Time

Three way handshake: Step 1: client host sends

TCP SYN segment (SYN bit=1) to server

• specifies initial seq # (client_isn)

• no data Step 2: server host receives

SYN, replies with SYNACK segment

• server allocates buffers• specifies server initial seq.

# (server_isn), with ACK # = client_isn+1

Step 3: client receives SYNACK, replies with ACK # = server_isn+1, which may contain data

Transport Layer 3-34

TCP Connection Setup Example

Client SYN SeqC: Seq. #4019802004, window 65535, max. seg. 1260

Server SYN-ACK+SYN Receive: #4019802005 (= SeqC+1) SeqS: Seq. #3428951569, window 5840, max. seg. 1460

Client SYN-ACK Receive: #3428951570 (= SeqS+1)

09:23:33.042318 IP 128.2.222.198.3123 > 192.216.219.96.80: S 4019802004:4019802004(0) win 65535 <mss 1260,nop,nop,sackOK>

09:23:33.118329 IP 192.216.219.96.80 > 128.2.222.198.3123: S 3428951569:3428951569(0) ack 4019802005 win 5840 <mss 1460,nop,nop,sackOK>

09:23:33.118405 IP 128.2.222.198.3123 > 192.216.219.96.80: . ack 3428951570 win 65535

sackOK: selective acknowledge

Transport Layer 3-35

TCP Connection Management - disconnecting

Closing a connection:

client closes socket: clientSocket.close();

Step 1: client end system sends TCP FIN control segment (FIN bit=1) to server

Step 2: server receives FIN, replies with ACK. Closes connection, sends FIN=1.

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

Transport Layer 3-36

TCP Connection Management (cont.)

Step 3: client receives FIN, replies with ACK.

Enters “timed wait” - will respond with ACK to received FINs where typical wait is 30 sec. All resources and ports are released.

Step 4: server, receives ACK. Connection closed.

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-37

TCP Conn.Teardown Example

Session Echo client on 128.2.222.198, server on 128.2.210.194

Client FIN SeqC: 1489294581

Server ACK + FIN Ack: 1489294582 (= SeqC+1) SeqS: 1909787689

Client ACK Ack: 1909787690 (= SeqS+1)

09:54:17.585396 IP 128.2.222.198.4474 > 128.2.210.194.6616: F 1489294581:1489294581(0) ack 1909787689 win 65434

09:54:17.585732 IP 128.2.210.194.6616 > 128.2.222.198.4474: F 1909787689:1909787689(0) ack 1489294582 win 5840

09:54:17.585764 IP 128.2.222.198.4474 > 128.2.210.194.6616: . ack 1909787690 win 65434

Transport Layer 3-38

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 3-39

Queue Management

Two queues for each listening socket

Transport Layer 3-40

Concurrent Server(1) pid_t pid; (2) int listenfd, connfd;(3) listenfd = Socket( ... );

(4) /* fill in sockaddr_in{} with server's well-known port */ (5) Bind(listenfd, ... ); (6) Listen(listenfd, LISTENQ);

(7) for ( ; ; ) {(8) connfd = Accept (listenfd, ... ); /* probably blocks */ (9) if( (pid = Fork()) == 0) { (10) Close(listenfd); /* child closes listening socket */ (11) doit(connfd); /* process the request */(12) Close(connfd); /* done with this client */ (13) exit(0); /* child terminates */ (14) } (15) Close(connfd); /* parent closes connected socket */ (16) }

Transport Layer 3-41

Concurrent Server (Cont’)

(a) Status before call to call to accept returns

(b) status after return from accept

(c) Status after return of spawning a process

(d) Status after parent/child close appropriate sockets

Transport Layer 3-42

TCP Summary TCP Properties:

point to point, connection-oriented, full-duplex, reliable

TCP Segment Structure How TCP sequence and acknowledgement #s are

assigned How does TCP measure the timeout value needed for

retransmissions using EstimatedRTT and DevRTT TCP retransmission scenarios, ACK generation and

fast retransmit How does TCP Flow Control work TCP Connection Management: 3-segments exchanged

to connect and 4-segments exchanged to disconnect