Transport Layer: Outline - TU Berlin€¦ · Transport Layer: Outline rTransport-layer services...

47
1 Transport Layer: Outline r Transport-layer services r Multiplexing and demultiplexing r Connectionless transport: UDP r Principles of reliable data transfer r Connection-oriented transport: TCP m Segment structure m Reliable data transfer m Connection management m Flow control r Principles of congestion control r TCP congestion control

Transcript of Transport Layer: Outline - TU Berlin€¦ · Transport Layer: Outline rTransport-layer services...

Page 1: Transport Layer: Outline - TU Berlin€¦ · Transport Layer: Outline rTransport-layer services rMultiplexing and demultiplexing rConnectionless transport: UDP rPrinciples of reliable

1

Transport Layer: Outliner Transport-layer servicesr Multiplexing and

demultiplexingr Connectionless transport:

UDPr Principles of reliable data

transfer

r Connection-oriented transport: TCPm Segment structurem Reliable data transferm Connection managementm Flow control

r Principles of congestion control

r TCP congestion control

Page 2: Transport Layer: Outline - TU Berlin€¦ · Transport Layer: Outline rTransport-layer services rMultiplexing and demultiplexing rConnectionless transport: UDP rPrinciples of reliable

2

TCP: Overview RFCs: 793, 1122, 1323, 2018, 2581

r Full duplex data:m Bi-directional data flow in

same connectionm MSS: maximum segment

size

r Connection-oriented:m Handshaking (exchange of

control msgs) init’s sender, receiver state before data exchange

r Flow controlled:m Sender will not overwhelm

receiver

r Congestion controlled:m Sender will not overwhelm

network

r Point-to-point:m One sender, one receiver

r Reliable, in-order byte stream:m No “message boundaries”

r Pipelined:m TCP congestion and flow

control set window size

r Send & receive buffers

socketdoor

TCPsend buffer

TCPreceive buffer

socketdoor

segment

applicationwrites data

applicationreads data

Page 3: Transport Layer: Outline - TU Berlin€¦ · Transport Layer: Outline rTransport-layer services rMultiplexing and demultiplexing rConnectionless transport: UDP rPrinciples of reliable

3

TCP segment structure

source port # dest port #

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberrcvr window size

ptr urgent datachecksum

FSRPAUheadlen

notused

Options (variable length)

URG: urgent data (generally not used)

ACK: ACK #valid

PSH: push data now(generally not used)

RST, SYN, FIN:connection estab(setup, teardown

commands)

Internetchecksum

(as in UDP)

# bytes rcvr willingto accept

countingby bytes of data(not segments!)

Page 4: Transport Layer: Outline - TU Berlin€¦ · Transport Layer: Outline rTransport-layer services rMultiplexing and demultiplexing rConnectionless transport: UDP rPrinciples of reliable

4

Transport layer: Outliner Transport-layer servicesr Multiplexing and

demultiplexingr Connectionless transport:

UDPr Principles of reliable data

transfer

r Connection-oriented transport: TCPm Segment structurem Reliable data transferm Connection managementm Flow control

r Principles of congestion control

r TCP congestion control

Page 5: Transport Layer: Outline - TU Berlin€¦ · Transport Layer: Outline rTransport-layer services rMultiplexing and demultiplexing rConnectionless transport: UDP rPrinciples of reliable

5

TCP reliable data transferr TCP creates rdt service on

top of IP’s unreliable service

r Pipelined segmentsr Cumulative acksr TCP uses single

retransmission timer

r Retransmissions are triggered by:m Timeout eventsm Duplicate acks

r Initially consider simplified TCP sender:m Ignore duplicate acksm Ignore flow control,

congestion controlm One way dataflow

Page 6: Transport Layer: Outline - TU Berlin€¦ · Transport Layer: Outline rTransport-layer services rMultiplexing and demultiplexing rConnectionless transport: UDP rPrinciples of reliable

6

TCP seq. #’s and ACKsSeq. #’s:

m Byte stream “number” of first byte in segment’s data

ACKs:m Seq # of next byte

expected from other side

m Cumulative ACKQ: How receiver handles

out-of-order segmentsm A: TCP spec doesn’t

say, - up to implementer

Host A Host B

Seq=42, ACK=79, data = ‘C'

Seq=79, ACK=43, data = ‘C’

Seq=43, ACK=80

Usertypes

‘C’

host ACKsreceipt

of echoed‘C’

host ACKsreceipt of‘C’, echoes

back ‘C’

timeSimple telnet scenario

Page 7: Transport Layer: Outline - TU Berlin€¦ · Transport Layer: Outline rTransport-layer services rMultiplexing and demultiplexing rConnectionless transport: UDP rPrinciples of reliable

7

TCP sender eventsData rcvd from app:r Create segment with

seq #r Seq # is byte-stream

number of first data byte in segment

r Start timer if not already running (think of timer as for oldest unacked segment)

r Expiration interval: TimeOutInterval

Timeout:r Retransmit the one

segment that caused timeout

r Restart timerAck rcvd:r If acknowledges

previously unackedsegmentsm Update what is known to

be ackedm Restart timer if there are

outstanding segments

Page 8: Transport Layer: Outline - TU Berlin€¦ · Transport Layer: Outline rTransport-layer services rMultiplexing and demultiplexing rConnectionless transport: UDP rPrinciples of reliable

8

TCP: reliable data transfer

Simplified sender, assuming

waitfor

event

waitfor

event

event: data received from application abovecreate, send segment

event: timer timeout

retransmit segment

event: ACK received,with ACK # y

ACK processing

• One way data transfer• No flow, congestion control

Page 9: Transport Layer: Outline - TU Berlin€¦ · Transport Layer: Outline rTransport-layer services rMultiplexing and demultiplexing rConnectionless transport: UDP rPrinciples of reliable

9

TCP sender (simplified)

00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) {04 switch(event)05 event: data received from application above06 create TCP segment with sequence number nextseqnum07 if (timer currently not running) start timer08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event: timer timeout11 retransmit not-yet-acknowledged segment with 12 smallest sequence number13 restart timer14 event: ACK received, with ACK field value of y15 if (y > sendbase) { /* cumulative ACK of all data up to y */ 17 sendbase = y 18 if (currently not-yet-acknowledged segments) { 19 restart timer20 } 21 } 22 } /* end of loop forever */

Page 10: Transport Layer: Outline - TU Berlin€¦ · Transport Layer: Outline rTransport-layer services rMultiplexing and demultiplexing rConnectionless transport: UDP rPrinciples of reliable

10

TCP retransmission scenariosHost A

timepremature timeout

Host B

Seq=

92 t

imeo

ut Seq=100, 20 bytes data

ACK=120ACK=100

Seq=92, 8 bytes data

Seq=92, 8 bytes data

ACK=120

Host A

lost ACK scenario

Host B

Seq=92, 8 bytes data

ACK=100

lossX

Seq=92, 8 bytes data

ACK=100

timeo

ut

timeSe

q=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Page 11: Transport Layer: Outline - TU Berlin€¦ · Transport Layer: Outline rTransport-layer services rMultiplexing and demultiplexing rConnectionless transport: UDP rPrinciples of reliable

11

TCP retransmission scenarios (2.)Host A

Seq=92, 8 bytes data

ACK=100

Cumulative ACK scenario

Host B

lossX

Seq=100, 20 bytes data

ACK=120

timeo

ut

time

SendBase= 120

Page 12: Transport Layer: Outline - TU Berlin€¦ · Transport Layer: Outline rTransport-layer services rMultiplexing and demultiplexing rConnectionless transport: UDP rPrinciples of reliable

12

TCP round trip time and timeout

Q: How to set TCP timeout value?

r Longer than RTTm Note: RTT will vary

r Too short: premature timeoutm Unnecessary

retransmissionsr Too long: slow reaction

to segment loss

Q: How to estimate RTT?r SampleRTT: measured time from

segment transmission until ACK receiptm Ignore retransmissions,

cumulatively ACKed segmentsr SampleRTT will vary, want

estimated RTT “smoother”m Use several recent

measurements, not just current SampleRTT

Page 13: Transport Layer: Outline - TU Berlin€¦ · Transport Layer: Outline rTransport-layer services rMultiplexing and demultiplexing rConnectionless transport: UDP rPrinciples of reliable

13

TCP round trip time and timeout

EstimatedRTT = (1 - α)* EstimatedRTT + α * SampleRTT

r Exponential weighted moving averager Influence of given sample decreases exponentially fastr Typical value of α: 0.125

r Key observation:m At high loads round trip variance is high

Page 14: Transport Layer: Outline - TU Berlin€¦ · Transport Layer: Outline rTransport-layer services rMultiplexing and demultiplexing rConnectionless transport: UDP rPrinciples of reliable

14

Example RTT estimationRTT: gaia.cs.umass.edu to fantasia.eurecom.fr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

Page 15: Transport Layer: Outline - TU Berlin€¦ · Transport Layer: Outline rTransport-layer services rMultiplexing and demultiplexing rConnectionless transport: UDP rPrinciples of reliable

15

TCP round trip time and timeoutSetting the timeout

r EstimtedRTT plus “safety margin”m Large variation in EstimatedRTT -> larger safety margin

r First estimate of how much SampleRTT deviates from EstimatedRTT:

TimeoutInterval = EstimatedRTT + 4*DevRTT

DevRTT = (1-β)*DevRTT +β*|SampleRTT-EstimatedRTT|

(typically, β = 0.25)

Then set timeout interval:

Page 16: Transport Layer: Outline - TU Berlin€¦ · Transport Layer: Outline rTransport-layer services rMultiplexing and demultiplexing rConnectionless transport: UDP rPrinciples of reliable

16

Retransmission ambiguity

A B

ACK

SampleRTT

Original transmission

retransmission

RTO

A BOriginal transmission

retransmissionSampleRTT

ACKRTOX

Page 17: Transport Layer: Outline - TU Berlin€¦ · Transport Layer: Outline rTransport-layer services rMultiplexing and demultiplexing rConnectionless transport: UDP rPrinciples of reliable

17

Karn’s RTT estimator

r Accounts for retransmission ambiguitym If a segment has been retransmitted: Don’t count

RTT sample on ACKs for this segment

r If retransmission timer expiresm Double retransmission TimeoutIntervalm Do not use RTT estimate to calculate

TimeoutInterval until successful retransmission

r Timer restarted (not due to timeout)m Reuse RTT estimate

Page 18: Transport Layer: Outline - TU Berlin€¦ · Transport Layer: Outline rTransport-layer services rMultiplexing and demultiplexing rConnectionless transport: UDP rPrinciples of reliable

18

Timestamp extension

r Used to improve timeout mechanism by more accurate measurement of RTT

rWhen sending a packet, insert current timestamp into optionm 4 bytes for seconds, 4 bytes for microseconds

r Receiver echoes timestamp in ACKm Actually will echo whatever is in timestamp

r Removes retransmission ambiguitym Can get RTT sample on any packet

Page 19: Transport Layer: Outline - TU Berlin€¦ · Transport Layer: Outline rTransport-layer services rMultiplexing and demultiplexing rConnectionless transport: UDP rPrinciples of reliable

19

Timer granularity

rMany TCP implementations set RTO in multiples of 200, 500, 1000ms

rWhy?m Avoid spurious timeouts – RTTs can vary quickly due to

cross trafficmMake timers interrupts efficient

Page 20: Transport Layer: Outline - TU Berlin€¦ · Transport Layer: Outline rTransport-layer services rMultiplexing and demultiplexing rConnectionless transport: UDP rPrinciples of reliable

20

Fast retransmitr Time-out period often

relatively long:m Long delay before resending

lost packet

r Detect lost segments via duplicate ACKs.m Sender often sends many

segments back-to-backm If segment is lost, there will

likely be many duplicate ACKs.

r If sender receives 3 ACKsfor the same data, it supposes that segment after ACKed data was lost:m Fast retransmit: resend

segment before timer expires

Page 21: Transport Layer: Outline - TU Berlin€¦ · Transport Layer: Outline rTransport-layer services rMultiplexing and demultiplexing rConnectionless transport: UDP rPrinciples of reliable

21

event: ACK received, with ACK field value of y if (y > SendBase) {

SendBase = yif (there are currently not-yet-acknowledged segments)

restart timer }

else { increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) {

resend segment with sequence number y}

Fast retransmit algorithm:

Duplicate ACK for already ACKed segment

Fast retransmit

Page 22: Transport Layer: Outline - TU Berlin€¦ · Transport Layer: Outline rTransport-layer services rMultiplexing and demultiplexing rConnectionless transport: UDP rPrinciples of reliable

22

Delayed ACK

r It is inefficient to send too many ACK only packetsrWhy?

m No data => >40 Bytes for 1 byte of information

r Goal: mWait for additional data to piggy bag ACK on data pkt.

r Implementationm Try to not ACK every packet but only ever secondmWait for at most 200msm ACK any out of order data

Page 23: Transport Layer: Outline - TU Berlin€¦ · Transport Layer: Outline rTransport-layer services rMultiplexing and demultiplexing rConnectionless transport: UDP rPrinciples of reliable

23

TCP ACK generation [RFC 1122, RFC 2581]

Event at Receiver

In-order segment withexpected seq #. All data up toexpected seq # already ACKed

In-order segment withexpected seq #. One other segment has ACK pending

Out-of-order segmenthigher-than-expect seq. # .Gap detected

Segment that partially or completely fills gap

TCP Receiver action

Delayed ACK. Wait up to 500msfor next segment. If no next segment,send ACK

Immediately send single cumulative ACK, ACKing both in-order segments

Immediately send duplicate ACK, indicating seq. # of next expected byte

Immediate send ACK, provided thatsegment starts at lower end of gap

Page 24: Transport Layer: Outline - TU Berlin€¦ · Transport Layer: Outline rTransport-layer services rMultiplexing and demultiplexing rConnectionless transport: UDP rPrinciples of reliable

24

Transport layer: Outliner Transport-layer servicesr Multiplexing and

demultiplexingr Connectionless transport:

UDPr Principles of reliable data

transfer

r Connection-oriented transport: TCPm Segment structurem Reliable data transferm Connection managementm Flow control

r Principles of congestion control

r TCP congestion control

Page 25: Transport Layer: Outline - TU Berlin€¦ · Transport Layer: Outline rTransport-layer services rMultiplexing and demultiplexing rConnectionless transport: UDP rPrinciples of reliable

25

TCP connection management

Recall: TCP sender, receiver establish “connection” before exchanging data segments

r Initialize TCP variables:m seq. #sm buffers, flow control info (e.g. RcvWindow)

r client: connection initiator

Socket clientSocket = new Socket("hostname","port number");

r server: contacted by clientSocket connectionSocket = welcomeSocket.accept();

Page 26: Transport Layer: Outline - TU Berlin€¦ · Transport Layer: Outline rTransport-layer services rMultiplexing and demultiplexing rConnectionless transport: UDP rPrinciples of reliable

26

A B

SYN + Seq ASYN+ACK-A + Seq B

ACK-B

Connection establishment

r Use 3-way handshake

Page 27: Transport Layer: Outline - TU Berlin€¦ · Transport Layer: Outline rTransport-layer services rMultiplexing and demultiplexing rConnectionless transport: UDP rPrinciples of reliable

27

Sequence number selection

rWhy not simply chose 0?rMust avoid overlap with earlier incarnation

Page 28: Transport Layer: Outline - TU Berlin€¦ · Transport Layer: Outline rTransport-layer services rMultiplexing and demultiplexing rConnectionless transport: UDP rPrinciples of reliable

28

TCP connection: Three way handshakeStep 1: Client end system sends TCP SYN control

segment to serverm Specifies initial seq #m Specifies initial window #

Step 2: Server end system receives SYN, replies with SYNACK control segment

m ACKs received SYNm Allocates buffersm Specifies server-> receiver initial seq. #m Specifies initial window #

Step 3: Client system receives SYNACK, replies withACK segment which may contain data

Page 29: Transport Layer: Outline - TU Berlin€¦ · Transport Layer: Outline rTransport-layer services rMultiplexing and demultiplexing rConnectionless transport: UDP rPrinciples of reliable

29

TCP connection management (2.)Closing a connection:

client closes socket:

clientSocket.close();

Step 1: Client end system sends TCP FIN control segment to server

Step 2: Server receives FIN, replies with ACK. Closes connection, sends FIN.

ACK

client server

FINclose

FINclose

Page 30: Transport Layer: Outline - TU Berlin€¦ · Transport Layer: Outline rTransport-layer services rMultiplexing and demultiplexing rConnectionless transport: UDP rPrinciples of reliable

30

TCP connection management (3.)Step 3: Client receives FIN,

replies with ACK.

m Enters “timed wait” - will respond with ACK to received FINs

Step 4: Server, receives ACK. Connection closed.

Note: With small modification, can handle simultaneous FINs.

client

FIN

server

ACK

FIN

closing

closing

closed

timed

wai

t ACK

closed

Page 31: Transport Layer: Outline - TU Berlin€¦ · Transport Layer: Outline rTransport-layer services rMultiplexing and demultiplexing rConnectionless transport: UDP rPrinciples of reliable

31

Tear-down packet exchange

Sender ReceiverFIN

FIN-ACK

FIN

FIN-ACK

Data write

Data ack

Page 32: Transport Layer: Outline - TU Berlin€¦ · Transport Layer: Outline rTransport-layer services rMultiplexing and demultiplexing rConnectionless transport: UDP rPrinciples of reliable

32

TCP connection management (cont.)

TCP client lifecycle

Page 33: Transport Layer: Outline - TU Berlin€¦ · Transport Layer: Outline rTransport-layer services rMultiplexing and demultiplexing rConnectionless transport: UDP rPrinciples of reliable

33

TCP connection management (cont.)TCP server lifecycle

Page 34: Transport Layer: Outline - TU Berlin€¦ · Transport Layer: Outline rTransport-layer services rMultiplexing and demultiplexing rConnectionless transport: UDP rPrinciples of reliable

34

Detecting half-open connections

1. (CRASH)2. CLOSED3. SYN-SENT à <SEQ=400><CTL=SYN>4. (!!) ß <SEQ=300><ACK=100><CTL=ACK>5. SYN-SENT à <SEQ=100><CTL=RST>6. SYN-SENT7. SYN-SENT à <SEQ=400><CTL=SYN>

(send 300, receive 100)ESTABLISHED

à (??)ß ESTABLISHEDà (Abort!!)

CLOSEDà

TCP BTCP A

Page 35: Transport Layer: Outline - TU Berlin€¦ · Transport Layer: Outline rTransport-layer services rMultiplexing and demultiplexing rConnectionless transport: UDP rPrinciples of reliable

35

Transport layer: outliner Transport-layer servicesr Multiplexing and

demultiplexingr Connectionless transport:

UDPr Principles of reliable data

transfer

r Connection-oriented transport: TCPm Segment structurem Reliable data transferm Connection managementm Flow control

r Principles of congestion control

r TCP congestion control

Page 36: Transport Layer: Outline - TU Berlin€¦ · Transport Layer: Outline rTransport-layer services rMultiplexing and demultiplexing rConnectionless transport: UDP rPrinciples of reliable

36

TCP flow controlr Receive side of TCP

connection has a receive buffer:

r Speed-matching service: match the send rate to the receiving app’s drain rate

r App process may be slow at reading from buffer

sender won’t overflowreceiver’s buffer by

transmitting too much,too fast

flow control

Page 37: Transport Layer: Outline - TU Berlin€¦ · Transport Layer: Outline rTransport-layer services rMultiplexing and demultiplexing rConnectionless transport: UDP rPrinciples of reliable

37

TCP flow control: How it works

(Suppose TCP receiver discards out-of-order segments)

r Spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

r Rcvr advertises spare room by including value of RcvWindow in segments

r Sender limits unACKeddata to RcvWindowm Guarantees receive buffer

doesn’t overflow

Page 38: Transport Layer: Outline - TU Berlin€¦ · Transport Layer: Outline rTransport-layer services rMultiplexing and demultiplexing rConnectionless transport: UDP rPrinciples of reliable

38

TCP flow control: How it works (2.)

r TCP is a sliding window protocolm For window size n, can send up to n bytes without

receiving an acknowledgement mWhen the data is acknowledged then the window

slides forward

r Each packet advertises a window sizem Indicates number of bytes the receiver has space for

rOriginal TCP always sent entire windowm Congestion control now limits this

Page 39: Transport Layer: Outline - TU Berlin€¦ · Transport Layer: Outline rTransport-layer services rMultiplexing and demultiplexing rConnectionless transport: UDP rPrinciples of reliable

39

Window flow control: Sender side

Sent but not acked Not yet sent

window

Next to be sent

Sent and acked

Page 40: Transport Layer: Outline - TU Berlin€¦ · Transport Layer: Outline rTransport-layer services rMultiplexing and demultiplexing rConnectionless transport: UDP rPrinciples of reliable

40

Acked but notdelivered to user

Not yetacked

Receive buffer

window

Window flow control: Receiver side

Page 41: Transport Layer: Outline - TU Berlin€¦ · Transport Layer: Outline rTransport-layer services rMultiplexing and demultiplexing rConnectionless transport: UDP rPrinciples of reliable

41

TCP persist

rWhat happens if window is 0?m Receiver updates window (i.e., sends ACK with new

window size) when application reads datamWhat if this update is lost?

r TCP persist statem Sender periodically sends 1 byte packetsm Receiver responds with ACK even if it can’t store the

packet

Page 42: Transport Layer: Outline - TU Berlin€¦ · Transport Layer: Outline rTransport-layer services rMultiplexing and demultiplexing rConnectionless transport: UDP rPrinciples of reliable

42

Observed TCP problems

r Too many small packetsm Silly window syndromem Nagel’s algorithm

r Initial sequence number selectionr Amount of state maintained

Page 43: Transport Layer: Outline - TU Berlin€¦ · Transport Layer: Outline rTransport-layer services rMultiplexing and demultiplexing rConnectionless transport: UDP rPrinciples of reliable

43

Silly window syndrome

r Problem: (Clark, 1982)m If receiver advertises small increases in the receive

window then the sender may waste time sending lots of small packets

r Solutionm Receiver must not advertise small window increases m Increase window by min(MSS,RecvBuffer/2)

Page 44: Transport Layer: Outline - TU Berlin€¦ · Transport Layer: Outline rTransport-layer services rMultiplexing and demultiplexing rConnectionless transport: UDP rPrinciples of reliable

44

Nagel’s algorithm

r Small packet problem:m Don’t want to send a 41 byte packet for each

keystrokem How long to wait for more data?

r Solution:m Allow only one outstanding small (not full sized)

segment that has not yet been acknowledged

Page 45: Transport Layer: Outline - TU Berlin€¦ · Transport Layer: Outline rTransport-layer services rMultiplexing and demultiplexing rConnectionless transport: UDP rPrinciples of reliable

45

Why is selecting ISN important?

r Suppose machine X selects ISN based on predictable sequence

r Fred has .rhosts to allow login to X from Yr Evil Ed attacks

m Disables host Y – denial of service attackmMake a bunch of connections to host Xm Determine ISN pattern and guess next ISNm Fake pkt1: [<src Y><dst X>, guessed ISN]m Fake pkt2: desired command

Page 46: Transport Layer: Outline - TU Berlin€¦ · Transport Layer: Outline rTransport-layer services rMultiplexing and demultiplexing rConnectionless transport: UDP rPrinciples of reliable

46

Time Wait issues

rWeb servers not clients close connection firstm Established à Fin-Waits à Time-Wait à ClosedmWhy would this be a problem?

r Time-Wait state lasts for 2 * MSLmMSL is should be 120 seconds (is often 60s)m Servers often have order of magnitude more

connections in Time-Wait

Page 47: Transport Layer: Outline - TU Berlin€¦ · Transport Layer: Outline rTransport-layer services rMultiplexing and demultiplexing rConnectionless transport: UDP rPrinciples of reliable

47

Transport layer: Outliner Transport-layer servicesr Multiplexing and

demultiplexingr Connectionless transport:

UDPr Principles of reliable data

transfer

r Connection-oriented transport: TCPm Segment structurem Reliable data transferm Flow controlm Connection management

r Principles of congestion control

r TCP congestion control