Internet Transport Protocols UDP / TCP - TU Berlin · Internet Transport Protocols UDP / TCP Prof....

49
1 Internet Transport Protocols UDP / TCP Prof. Anja Feldmann, Ph.D. [email protected] TCP/IP Illustrated, Volume 1, W. Richard Stevens http://www.kohala.com/start

Transcript of Internet Transport Protocols UDP / TCP - TU Berlin · Internet Transport Protocols UDP / TCP Prof....

1

Internet Transport ProtocolsUDP / TCP

Prof. Anja Feldmann, [email protected]

TCP/IP Illustrated, Volume 1, W. Richard Stevens http://www.kohala.com/start

2

Transport Layer: Outline

r Transport-layer servicesr Multiplexing and

demultiplexingr Connectionless transport:

UDP

r Connection-oriented transport: TCPm Segment structurem Reliable data transferm Flow controlm Connection management

r Principles of congestion control

r TCP congestion control

3

Internet Transport-Layer Protocols

r Network layer: Logical communication between hosts

r Transport layer: Logical communication between processes m Relies on, enhances,

network layer services

r More than one transport protocol available to appsm Internet:

• TCP• UDP

applicationtransportnetworkdata linkphysical

applicationtransportnetworkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysicalnetwork

data linkphysical

logical end-end transport

4

Sockets: interface to applications

Socket APIr Introduced in BSD4.1 UNIX,

1981r Explicitly created, used,

released by apps r Client/server paradigm r Two types of transport service

via socket API: m Unreliable datagram m Reliable, byte stream-

oriented

A host-local, application-created/owned,

OS-controlled interface (a “door”) into which

application process can both send and

receive messages to/from another (remote or

local) application process

socket

5

Sockets and OS

Socket: a door between application process and end-end-transport protocol (UCP or TCP)

process

TCP withbuffers,variables

socket

controlled byapplicationdeveloper

controlled byoperating

system

host orserver

process

TCP withbuffers,variables

socket

controlled byapplicationdeveloper

controlled byoperatingsystem

host orserver

internet

6

application

transport

network

link

physical

application

transport

network

link

physical

application

transport

network

link

physical

host 1 host 2 host 3

Multiplexing/Demultiplexing

P2 P4P1P3

= process= socket

Delivering received segmentsto correct application (socket)

Demultiplexing at rcv host:Gathering data from multipleappl. (sockets), enveloping data with header (later usedfor demultiplexing)

Multiplexing at send host:

7

Multiplexing/Demultiplexing

Multiplexing/demultiplexing:r Based on sender, receiver port

numbers, IP addressesm Source, dest port #s in each

segmentm Well-known port numbers

for specific applications(see /etc/services)

source port # dest port #

32 bits

applicationdata

(message)

other header fields

TCP/UDP segment format

8

Multiplexing/Demultiplexing: Examples

host A server Bsource port: xdest. port: 23

source port:23dest. port: x

Port use: simple telnet app

Source IP: CDest IP: B

source port: xdest. port: 80

WWW clienthost C

Source IP: CDest IP: B

source port: ydest. port: 80

WWW clienthost A

WWWserver B

Port use: WWW server

Source IP: ADest IP: B

source port: xdest. port: 80

9

UDP: User Datagram Protocol [RFC 768]

r “No frills,” “bare bones”Internet transport protocol

r “Best effort” service, UDP segments may be:m Lostm Delivered out of order to

applicationr Connectionless:

m No handshaking between UDP sender, receiver

m Each UDP segment handled independently of others

Why is there a UDP?r No connection establishment

(which can add delay)r Simple: no connection state

at sender, receiverr Small segment headerr No congestion control: UDP

can blast away as fast as desired

10

UDP: More

r Each user request transferred in a single datagram

r UDP has a receive buffer but no sender buffer

r Often used for streaming multimedia appsm Loss tolerantm Rate sensitive

r Other UDP uses (why?):m DNS, SNMP, NFS

r Reliable transfer over UDP: add reliability at application layer

source port # dest port #

32 bits

Applicationdata

(message)

UDP segment format

length checksumLength, in

bytes of UDPsegment,including

header

11

UDP Checksum

rOnes complement of 16 bit words (same as IP)r Covers data plus a 12 byte pseudo headerm IP addresses, 0, protocol identifier, lengthm Ensures that packet has reached the correct host

r Pad byte in case of an odd packet length rOptional – Zero indicates no checksumm Should always be turned on

r Receiver has to verify checksum

12

TCP: Overview RFCs: 793, 1122, 1323, 2018, 2581

r Connection-oriented:m Handshaking (exchange

of control msgs) init’ssender, receiver state before data exchange

r Flow controlled:m Sender will not

overwhelm receiver

r Congestion controlled:m Sender will not

overwhelm the network

r Point-to-point:m One sender, one receiver

r Reliable, in-order byte stream:m No “message boundaries”

r Pipelined:m TCP congestion and flow

control set window size

r Full duplex data:m Bi-directional data flow in

same connectionm MSS: maximum segment

size

13

Simulating Transport Protocols

r Network simulatorr Examples: m Network Simulator (NS), SSFNet, …

r Animation of NS traces via NAM (Network Animator)

r Try it!

14

Simulating Transport Protocols

r Example: 2 TCP connections + 1 UDP flowr Topology:

r TCP1 starts at time 0 seconds, TCP2 at time 3 secondsr UDP starts at time 15 seconds

TCP 1

2 Mb25 ms

380 Kb10 ms

TCP 2

UDP

TCP 1

TCP 2

UDP

Node 1 Node 2Node 0

15

Simulation Results

16

TCP Segment Structure

source port # dest port #

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberrcvr window size

ptr urgent datachecksum

FSRPAUheadlen

notused

Options (variable length)

URG: urgent data (generally not used)

ACK: ACK #valid

PSH: push data now(generally not used)

RST, SYN, FIN:connection estab(setup, teardown

commands)

# bytes rcvr willingto accept

countingby bytes of data(not segments!)

Internetchecksum

(as in UDP)

17

TCP Reliability: Seq. #’s and ACKsSeq. #’s:

m Byte stream“Number” of first byte in segment’s data

ACKs:m Seq # of next byte

expected from other side

m Cumulative ACKQ: Now receiver handles

out-of-order segmentsm A: TCP spec doesn’t

say, – up to implementer

Host A Host B

Seq=42, ACK=79, data = ‘C’

Seq=79, ACK=43, data = ‘C’

Seq=43, ACK=80

Usertypes

‘C’

host ACKsreceipt

of echoed‘C’

host ACKsreceipt of‘C’, echoes

back ‘C’

timesimple telnet scenario

18

TCP: Reliable Data Transfer

r Packet loss detection:m Retransmission timeoutm Fast retransmit

• Three duplicate ACKs

r Retransmission mechanismm ARQ: Go-Back-N,

selected retransmissions

waitfor

event

waitfor

event

event: data received from application abovecreate, send segment

event: timer timeout for segment with seq # y

retransmit segment

event: ACK received,with ACK # y

ACK processing

r Simplified sender Assumptionm One way data transferm No flow, congestion

control

19

TCP: Retransmission Scenarios

Seq=92, 8 bytes data

ACK=100

lossX

Seq=92, 8 bytes data

ACK=100

time

Host A

lost ACK scenario

Host B

timeo

ut

ACK=100

Seq=92, 8 bytes data

Host A

time premature timeout,cumulative ACKs

Host B

ACK=120

Seq=100, 20 bytes data

Seq=92, 8 bytes data

Seq=

92 t

imeo

utSe

q=10

0 tim

eout

ACK=120

20

TCP ACK Generation [RFC 1122, RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq #. All data up toexpected seq # already ACKed

Arrival of in-order segment withexpected seq #. One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq. # .Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK. Wait up to 500msfor next segment. If no next segment,send ACK (reduces ACK traffic)

Immediately send single cumulative ACK, ACKing both in-order segments

Immediately send duplicate ACK, indicating seq. # of next expected byte(trigger fast retransmit)

Immediate send ACK, provided thatsegment starts at lower end of gap

21

TCP Retransmission Timeout

r TCP uses one timer for one pkt onlyr Retransmission Timeout (RTO) calculated dynamically

m Based on Round Trip Time estimation (RTT)m Wait at least one RTT before retransmittingm Importance of accurate RTT estimators:

• Low RTT à unneeded retransmissions• High RTT à poor throughput

m RTT estimator must adapt to change in RTT• But not too fast, or too slow!

m Spurious timeouts• “Conservation of packets” principle – more than a window worth of

packets in flight

22

Retransmission Timeout Estimatorr Round trip times exponentially averaged:

m New RTT = α (old RTT) + (1 - α) (new sample)

m 0.875 for most TCP’sr Retransmit timer set to β RTT, where β = 2

m Every time timer expires, RTO exponentially backed-off

r Key observation: At high loads round trip variance is high

r Solution (currently in use):m Base RTO on RTT and standard deviation of RTT:

RTT + 4 * rttvarm rttvar = χ * dev + (1- χ)rttvar

• dev = linear deviation (also referred to as mean deviation)• Inappropriately named – actually smoothed linear deviation

m RTO is discretized into ticks of 500ms (RTO >= 2ticks)

23

Retransmission AmbiguityA B

ACK

SampleRTT

Original transmission

retransmission

RTO

A BOriginal transmission

retransmissionSampleRTT

ACKRTOX

r Karn’s RTT Estimatorm If a segment has been retransmitted:m Don’t count RTT sample on ACKs for this segmentm Keep backed off time-out for next packetm Reuse RTT estimate only after one successful transmission

24

TCP Flow Control: Sliding Window Proto.Receiver: Explicitly informs

sender of (dynamically changing) amount of free buffer space m rcvr window size field in TCP segment

Sender: Amount of transmitted, unACKeddata less than most recently-receiver rcvr window size

sender won’t overrunreceiver’s buffers by

transmitting too much,too fast

flow control

receiver buffering

25

TCP Flow Controlr TCP is a sliding window protocolm For window size n, can send up to n bytes without

receiving an acknowledgement mWhen the data is acknowledged then the window

slides forward

rOriginal TCP always sent entire windowm Congestion control now limits this via congestion

window determined by the sender! (network limited)m If not data rate is receiver limited

r Silly window syndromem Too many small packets in flightm Limit the # of smaller pkts than MSS to one per RTT

26

Window Flow Control:

Sent but not acked Not yet sent

sender window

Next to be sent

Sent and acked

Acked but notdelivered to user

Not yetacked

Receive buffer

rcvr window

Sender Side

Receiver Side

27

Ideal Window Size

r Ideal size = delay * bandwidthm Delay-bandwidth product (RTT * bottleneck bitrate)

rWindow size < delay*bw _ wasted bandwidth

rWindow size > delay*bw _m Queuing at intermediate routers _ increased RTT

m Eventually packet loss

28

TCP Connection Management

Recall: TCP sender, receiver establish “connection”before exchanging data segments

r Initialize TCP variables:m Seq. #sm Buffers, flow control info (e.g. RcvWindow)mMSS and other options

r Client: connection initiator, server: contacted by client

r Three-way handshake m Simultaneous open

r TCP Half-Close (four-way handshake)r Connection aborts via RSTs

29

TCP Connection Management (2)Three way handshake:

Step 1: Client end system sends TCP SYN control segment to serverm Specifies initial seq #m Specifies initial window #

Step 2: Server end system receives SYN, replies with SYNACK control segment

m ACKs received SYNm Allocates buffersm Specifies server ? receiver initial seq. #m Specifies initial window #

Step 3: Client system receives SYNACK

30

TCP Connection Management (3)

Closing a connection:

Client closes socket:clientSocket.close();

Step 1: Client end system sends TCP FIN control segment to server

Step 2: Server receives FIN, replies with ACK. Closes connection, sends FIN.

ACK

client server

FINclose

FINclose

31

TCP Connection Management (4)

Step 3: Client receives FIN, replies with ACK.

m Enters “timed wait” – will respond with ACK to received FINs

Step 4: Server, receives ACK. Connection closed.

Note: With small modification, can handly simultaneous FINs.

client

FIN

server

ACK

FIN

closing

closing

closed

timed

wai

t ACK

closed

32

TCP Connection Management (5)

TCP client lifecycle

33

TCP Connection Management (cont)TCP server lifecycle

34

TCP state machine

35

Excursion: Congestion Control Principles

36

TCP Acknowledgement Clocking

r TCP is “self-clocking”r New data sent when old data is ackedr Ensures an “equilibrium”r But how to get started? m Slow Startm Congestion Avoidance

rOther TCP featuresm Fast Retransmissionm Fast Recovery

37

TCP Congestion Control:

r Two “phases”m Slow startm Congestion avoidance

r Important variables:m Congwinm threshold (ssthresh):

Defines threshold between two slow start phase, congestion control phase

r “Probing” for usable bandwidth:m Ideally: Transmit as fast as

possible (Congwin as large as possible) without loss

m Increase Congwin until loss (congestion)

m Loss: Decrease Congwin, then begin probing (increasing) again

38

TCP Slowstart

r Exponential increase (per RTT) in window size (not so slow!)

r Loss event: Timeout (Tahoe TCP) and/or or three duplicate ACKs (Reno TCP)

initialize: Congwin = 1for (each segment ACKed)

Congwin++until (loss event OR

CongWin > threshold)

Slowstart algorithm

one segment

RTT

Host A Host B

time

two segments

four segments

39

Congestion Avoidance

r Loss implies congestion – why?m Not necessarily true on all link types

r If loss occurs when cwnd = Wm Network can handle 0.5W ~ W segmentsm Set cwnd to 0.5W (multiplicative decrease)

r Upon receiving new ACKm Increase cwnd by 1/cwnd m Results in additive increase

40

TCP Congestion Avoidance

/* slowstart is over */ /* Congwin > threshold */Until (loss event) {every w segments ACKed:

Congwin++}

threshold = Congwin/2Congwin = 1perform slowstart

Congestion avoidance

1

1: TCP Reno skips slowstart (fast recovery) after three duplicate ACKs

41

Return to Slow Start

r If packet is lost we loose self clockingm Need to implement slow-start and congestion

avoidance together

rWhen timeout occurs m Set threshold to 0.5 W (current window size)m Set cwnd to one segment

rWhen three duplicate acks occur:m Set threshold to 0.5 Wm Retransmit missing segment == Fast Retransmitm cwnd = threshold + number of dupacksm Upon receiving acks cwnd = threshold (cut in half!)m Use congestion avoidance == Fast Recovery

42

TCP Congestion Control

r End-end control (no network assistance)r TCP throughput limited by rcvr window (flow control)r Transmission rate limited by congestion window size,

Congwin, over segments:

r w segments, each with MSS bytes sent in one RTT

Congwin

43

0

5

10

15

20

25

30

Time

Num

ber DATA

ACKcwndinFlight

After fast recovery

Fast Recovery Example

r cwnd =6; in congestion avoidance

44

Sequence Number Plot (Simulation)

45

Seq. Number Plot (Simulation) zoom

46

TCP Flavors / Variants

r TCP Tahoem Slow Startm Congestion Avoidancem Timeout, 3 duplicate acks ? cwnd = 1 _ slow start

r TCP Renom Slow-startm Congestion avoidancem Fast retransmit, Fast recoverym Timeout ? cwnd = 1 _ slow startm Three duplicate acks ? Fast Recovery,

Congestion Avoidance

47

Extensions

r Fast recovery, multiple losses per RTT _ timeoutr TCP New-Reno

m Stay in fast recovery until all packet losses in window are recovered

m Can recover 1 packet loss per RTT without causing a timeout

r Selective Acknowledgements (SACK) [rfc2018]m Provides information about out-of-order packets

received by receiverm Can recover multiple packet losses per RTT

48

Additional TCP Features

r Urgent Datam Nice for interactive applicationsm In-Band via urgent pointer

r Nagle algorithmm Avoidance of small segmentsm Needed for interactive applicationsmMethodology: only one outstanding packet can be small

49

Summary

r Reviewed principles of transport layer:m Reliable data transferm Flow controlm Congestion controlm (Multiplexing)

r Instantiation in the InternetmUDPmTCP