1 School of Computing Science Simon Fraser University CMPT 771/471: Internet Architecture and...

45
1 School of Computing Science Simon Fraser University CMPT 771/471: Internet Architecture and CMPT 771/471: Internet Architecture and Protocols Protocols Transport Layer Transport Layer Instructor: Dr. Mohamed Hefeeda Instructor: Dr. Mohamed Hefeeda

Transcript of 1 School of Computing Science Simon Fraser University CMPT 771/471: Internet Architecture and...

Page 1: 1 School of Computing Science Simon Fraser University CMPT 771/471: Internet Architecture and Protocols Transport Layer Instructor: Dr. Mohamed Hefeeda.

1

School of Computing Science

Simon Fraser University

CMPT 771/471: Internet Architecture and CMPT 771/471: Internet Architecture and ProtocolsProtocols

Transport LayerTransport Layer

Instructor: Dr. Mohamed HefeedaInstructor: Dr. Mohamed Hefeeda

Page 2: 1 School of Computing Science Simon Fraser University CMPT 771/471: Internet Architecture and Protocols Transport Layer Instructor: Dr. Mohamed Hefeeda.

2

Review of Basic Networking Concepts

Internet structure Protocol layering and encapsulation Internet services and socket programming Network Layer

Network types: Circuit switching, Packet switching Addressing, Forwarding, Routing

Transport layer Reliability, congestion and flow control TCP, UDP

Link Layer Multiple Access Protocols Ethernet

Page 3: 1 School of Computing Science Simon Fraser University CMPT 771/471: Internet Architecture and Protocols Transport Layer Instructor: Dr. Mohamed Hefeeda.

3

Transport services and protocols

provide logical communication between app processes running on different hosts

transport protocols run in end systems

send side: breaks app messages into segments, passes to network layer

rcv side: reassembles segments into messages, passes to app layer

more than one transport protocol available to apps

Internet: TCP and UDP

application

transportnetworkdata linkphysical

application

transportnetworkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysicalnetwork

data linkphysical

logical end-end transport

Page 4: 1 School of Computing Science Simon Fraser University CMPT 771/471: Internet Architecture and Protocols Transport Layer Instructor: Dr. Mohamed Hefeeda.

4

Transport vs. network layer

network layer: logical communication between hosts

transport layer: logical communication between processes

relies on, enhances, network layer services

Household analogy:

12 kids sending letters to 12 kids

processes = kids app messages = letters

in envelopes hosts = houses transport protocol = Ann

and Bill network-layer protocol =

postal service

Page 5: 1 School of Computing Science Simon Fraser University CMPT 771/471: Internet Architecture and Protocols Transport Layer Instructor: Dr. Mohamed Hefeeda.

5

Multiplexing/demultiplexing

application

transport

network

link

physical

P1 application

transport

network

link

physical

application

transport

network

link

physical

P2P3 P4P1

host 1 host 2host 3

= process= socket

delivering received segmentsto correct socket

Demultiplexing at rcv host:gathering data from multiplesockets, enveloping data with header (later used for demultiplexing)

Multiplexing at send host:

Page 6: 1 School of Computing Science Simon Fraser University CMPT 771/471: Internet Architecture and Protocols Transport Layer Instructor: Dr. Mohamed Hefeeda.

6

Connectionless demux

ClientIP:B

P2

client IP: A

P1P1P3

serverIP: C

SP: 6428

DP: 9157

SP: 6428

DP: 5775

SP: 5775

DP: 6428SP: 9157

DP: 6428

UDP socket identified by: (dst IP, dst Port) datagrams with different src IPs and/or src ports are directed to same socket

Page 7: 1 School of Computing Science Simon Fraser University CMPT 771/471: Internet Architecture and Protocols Transport Layer Instructor: Dr. Mohamed Hefeeda.

7

Connection-oriented demux (cont)

ClientIP:B

P1

client IP: A

P1P2P4

serverIP: C

SP: 9157

DP: 80

SP: 9157

DP: 80

P5 P6 P3

D-IP:CS-IP: A

D-IP:C

S-IP: B

SP: 5775

DP: 80

D-IP:CS-IP: B

TCP socket identified by 4-tuple: (src IP, src Port, dst IP, dst Port)

Page 8: 1 School of Computing Science Simon Fraser University CMPT 771/471: Internet Architecture and Protocols Transport Layer Instructor: Dr. Mohamed Hefeeda.

8

UDP: User Datagram Protocol [RFC 768]

“no frills,” “bare bones” Internet transport protocol

“best effort” service, UDP segments may be:

lost delivered out of order

to app Connectionless:

no handshaking between UDP sender, receiver

each UDP segment handled independently of others

Why is there a UDP? no connection

establishment (which can add delay)

simple: no connection state at sender, receiver

small segment header no congestion control: UDP

can blast away as fast as desired

Page 9: 1 School of Computing Science Simon Fraser University CMPT 771/471: Internet Architecture and Protocols Transport Layer Instructor: Dr. Mohamed Hefeeda.

9

UDP

often used for streaming multimedia apps

loss tolerant rate sensitive

other UDP uses DNS SNMP

reliable transfer over UDP: add reliability at application layer

application-specific error recovery!

source port # dest port #

32 bits

Applicationdata

(message)

UDP segment format

length checksumLength, in

bytes of UDPsegment,including

header

Page 10: 1 School of Computing Science Simon Fraser University CMPT 771/471: Internet Architecture and Protocols Transport Layer Instructor: Dr. Mohamed Hefeeda.

10

Reliable data transfer

important in application, transport, and link layers top-10 list of important networking topics!

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Page 11: 1 School of Computing Science Simon Fraser University CMPT 771/471: Internet Architecture and Protocols Transport Layer Instructor: Dr. Mohamed Hefeeda.

11

Pipelined (Sliding Window) Protocols

Pipelining: sender allows multiple, “in-flight”, yet-to-be-acknowledged pkts

range of sequence numbers must be increased buffering at sender and/or receiver

Two generic forms of pipelined protocols: go-Back-N, selective repeat

Page 12: 1 School of Computing Science Simon Fraser University CMPT 771/471: Internet Architecture and Protocols Transport Layer Instructor: Dr. Mohamed Hefeeda.

12

Go-Back-N

Sender: k-bit seq # in pkt header “window” of up to N, consecutive unack’ed pkts allowed

ACK(n): ACKs all pkts up to, including seq # n -- cumulative ACK may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n): retransmit pkt n and all higher seq # pkts in window

i.e., go back to n

Page 13: 1 School of Computing Science Simon Fraser University CMPT 771/471: Internet Architecture and Protocols Transport Layer Instructor: Dr. Mohamed Hefeeda.

13

GBN inaction

Window size, N = 4

Go back to 2

Page 14: 1 School of Computing Science Simon Fraser University CMPT 771/471: Internet Architecture and Protocols Transport Layer Instructor: Dr. Mohamed Hefeeda.

14

Go-Back-N

Do you see potential problems with GBN?

Consider high-speed links with long delays (called large bandwidth-delay product pipes)

GBN can fill that pipe by having large N many unACKed pkts could be in the pipe A single lost pkt could cause a re-transmission of a huge

number (up to N) of pkts waste of bandwidth

Solutions??

Page 15: 1 School of Computing Science Simon Fraser University CMPT 771/471: Internet Architecture and Protocols Transport Layer Instructor: Dr. Mohamed Hefeeda.

15

Selective Repeat

receiver individually acknowledges all correctly received pkts

buffers pkts, as needed, for eventual in-order delivery to upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq #’s again limits seq #s of sent, unACKed pkts

Page 16: 1 School of Computing Science Simon Fraser University CMPT 771/471: Internet Architecture and Protocols Transport Layer Instructor: Dr. Mohamed Hefeeda.

16

Selective repeat: sender, receiver windows

Page 17: 1 School of Computing Science Simon Fraser University CMPT 771/471: Internet Architecture and Protocols Transport Layer Instructor: Dr. Mohamed Hefeeda.

17

TCP: Overview RFCs: 793, 1122, 1323, 2018, 2581

full duplex data: bi-directional data flow

in same connection MSS: maximum

segment size

connection-oriented: handshaking (exchange

of control msgs) init’s sender, receiver state before data exchange

flow controlled: sender will not

overwhelm receiver

point-to-point: one sender, one receiver

reliable, in-order byte stream:

no “message boundaries”

pipelined: TCP congestion and flow

control set window size

send & receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

Page 18: 1 School of Computing Science Simon Fraser University CMPT 771/471: Internet Architecture and Protocols Transport Layer Instructor: Dr. Mohamed Hefeeda.

18

TCP segment structure

source port # dest port #

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG: urgent data (generally not used)

ACK: ACK #valid

PSH: push data now(generally not used)

RST, SYN, FIN:connection estab(setup, teardown

commands)

# bytes rcvr willingto accept

countingby bytes of data(not segments!)

Internetchecksum

(as in UDP)

Page 19: 1 School of Computing Science Simon Fraser University CMPT 771/471: Internet Architecture and Protocols Transport Layer Instructor: Dr. Mohamed Hefeeda.

19

TCP reliable data transfer

TCP creates rdt service on top of IP’s unreliable service

Pipelined segments Cumulative acks TCP uses single

retransmission timer

Retransmissions are triggered by:

timeout events duplicate acks

Initially consider simplified TCP sender:

ignore duplicate acks ignore flow control,

congestion control

Page 20: 1 School of Computing Science Simon Fraser University CMPT 771/471: Internet Architecture and Protocols Transport Layer Instructor: Dr. Mohamed Hefeeda.

20

TCP sender events:

data rcvd from app: Create segment with seq

# seq # is byte-stream

number of first data byte in segment

start timer if not already running (think of timer as for oldest unacked segment)

expiration interval: TimeOutInterval

timeout: retransmit segment that

caused timeout restart timer

Ack rcvd: If acknowledges

previously unacked segments

update what is known to be acked

start timer if there are outstanding segments

Page 21: 1 School of Computing Science Simon Fraser University CMPT 771/471: Internet Architecture and Protocols Transport Layer Instructor: Dr. Mohamed Hefeeda.

21

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) { switch(event)

event: data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event: timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event: ACK received, with ACK field value of y if (y > SendBase) { SendBase = y if (there are currently not-yet-acknowledged segments) start timer }

} /* end of loop forever */

Page 22: 1 School of Computing Science Simon Fraser University CMPT 771/471: Internet Architecture and Protocols Transport Layer Instructor: Dr. Mohamed Hefeeda.

22

SendBase= 120

TCP: retransmission scenarios

Host A

Seq=100, 20 bytes data

ACK=100

time

premature timeout

Host B

Seq=92, 8 bytes data

ACK=120

Seq=92, 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92, 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92, 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

Sendbase= 100

Page 23: 1 School of Computing Science Simon Fraser University CMPT 771/471: Internet Architecture and Protocols Transport Layer Instructor: Dr. Mohamed Hefeeda.

23

TCP retransmission scenarios (more)

Host A

Seq=92, 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100, 20 bytes data

ACK=120

time

SendBase= 120

Page 24: 1 School of Computing Science Simon Fraser University CMPT 771/471: Internet Architecture and Protocols Transport Layer Instructor: Dr. Mohamed Hefeeda.

24

TCP Round Trip Time and Timeout

If TCP timeout is too short: premature timeout unnecessary

retransmissions too long: slow reaction to segment loss

Q: how to set TCP timeout value? Based on Round Trip Time (RTT), but RTT itself varies with

time! We need to estimate current RTT

RTT Estimation SampleRTT: measured time from segment transmission

until ACK receipt ignore retransmissions SampleRTT will vary, want estimated RTT “smoother” average several recent measurements, not just current SampleRTT

Page 25: 1 School of Computing Science Simon Fraser University CMPT 771/471: Internet Architecture and Protocols Transport Layer Instructor: Dr. Mohamed Hefeeda.

25

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )*EstimatedRTT + *SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value: = 0.125

Page 26: 1 School of Computing Science Simon Fraser University CMPT 771/471: Internet Architecture and Protocols Transport Layer Instructor: Dr. Mohamed Hefeeda.

26

Example RTT estimation:

RTT: gaia.cs.umass.edu to fantasia.eurecom.fr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

Page 27: 1 School of Computing Science Simon Fraser University CMPT 771/471: Internet Architecture and Protocols Transport Layer Instructor: Dr. Mohamed Hefeeda.

27

TCP Round Trip Time and Timeout

Setting the timeout

EstimtedRTT plus safety margin large variation in EstimatedRTT -> larger safety margin

first estimate how much SampleRTT deviates from EstimatedRTT:

TimeoutInterval = EstimatedRTT + 4*DevRTT

DevRTT = (1-)*DevRTT + *|SampleRTT - EstimatedRTT|

(typically, = 0.25)

Then set timeout interval:

Page 28: 1 School of Computing Science Simon Fraser University CMPT 771/471: Internet Architecture and Protocols Transport Layer Instructor: Dr. Mohamed Hefeeda.

28

Fast Retransmit

Time-out period often relatively long:

long delay before resending lost packet

Detect lost segments via duplicate ACKs.

Sender often sends many segments back-to-back

If segment is lost, there will likely be many duplicate ACKs.

If sender receives 3 ACKs for the same data, it supposes that segment after ACKed data was lost:

fast retransmit: resend segment before timer expires

Page 29: 1 School of Computing Science Simon Fraser University CMPT 771/471: Internet Architecture and Protocols Transport Layer Instructor: Dr. Mohamed Hefeeda.

29

TCP Connection Management: opening

TCP: 3-way handshakeStep 1: client host sends TCP SYN

segment to server specifies initial seq # no data

Step 2: server host receives SYN, replies with SYNACK segment

server allocates buffers specifies server initial seq. #

Step 3: client receives SYNACK, replies with ACK segment, which may contain data

client

SYN=1, seq= client_isn

server

SYN=1, seq=server_isn,

ack=client_isn+1

SYN=0, seq=client_isn+1,

ack=server_isn+1

conn. request

conn. granted

Q. How would a hacker exploit TCP 3-way handshake to bring a server down?

A. SYN Flood DoS attack

Page 30: 1 School of Computing Science Simon Fraser University CMPT 771/471: Internet Architecture and Protocols Transport Layer Instructor: Dr. Mohamed Hefeeda.

30

TCP Connection Management: closing

Step 1: client end system sends TCP FIN segment to server

Step 2: server receives FIN, replies with ACK. Closes connection, sends FIN

Step 3: client receives FIN, replies with ACK

Enters “timed wait” – may need to re-send ACK to received FINs

Step 4: server, receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closedti

med w

ait

closed

Page 31: 1 School of Computing Science Simon Fraser University CMPT 771/471: Internet Architecture and Protocols Transport Layer Instructor: Dr. Mohamed Hefeeda.

31

TCP Connection Management

TCP clientlifecycle

TCP serverlifecycle

Page 32: 1 School of Computing Science Simon Fraser University CMPT 771/471: Internet Architecture and Protocols Transport Layer Instructor: Dr. Mohamed Hefeeda.

32

TCP Flow Control

receive side of TCP connection has a receive buffer:

speed-matching service: matching the send rate to the receiving app’s drain rate

app process may be slow at reading from buffer

sender won’t overflow

receiver’s buffer bytransmitting too

much, too fast

flow control

Page 33: 1 School of Computing Science Simon Fraser University CMPT 771/471: Internet Architecture and Protocols Transport Layer Instructor: Dr. Mohamed Hefeeda.

33

TCP Flow control: how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow

= RcvBuffer-[LastByteRcvd - LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow

guarantees receive buffer doesn’t overflow

Page 34: 1 School of Computing Science Simon Fraser University CMPT 771/471: Internet Architecture and Protocols Transport Layer Instructor: Dr. Mohamed Hefeeda.

34

Congestion Control

Congestion: sources send too much data for network to handle

different from flow control, which is e2e Congestion results in …

lost packets (buffer overflow at routers)• more work (retransmissions) for given “goodput”

long delays (queueing in router buffers)• Premature (unneeded) retransmissions

Waste of upstream links’ capacity • Pkt traversed several links, then dropped at

congested router

Page 35: 1 School of Computing Science Simon Fraser University CMPT 771/471: Internet Architecture and Protocols Transport Layer Instructor: Dr. Mohamed Hefeeda.

35

Approaches towards congestion control

End-end congestion control: no explicit feedback from

network congestion inferred from

end-system observed loss, delay

approach taken by TCP

Network-assisted congestion control:

routers provide feedback to end systems

single bit indicating congestion (SNA, DECbit, TCP/IP ECN, ATM)

explicit rate sender should send at

Two broad approaches towards congestion control:

Page 36: 1 School of Computing Science Simon Fraser University CMPT 771/471: Internet Architecture and Protocols Transport Layer Instructor: Dr. Mohamed Hefeeda.

36

TCP congestion control: Approach

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach: probe for usable bandwidth in network increase transmission rate until loss occurs then

decrease Additive increase, multiplicative decrease (AIMD)

time

Rat

e (C

ongW

in)

Saw toothbehavior: probing

for bandwidth

Page 37: 1 School of Computing Science Simon Fraser University CMPT 771/471: Internet Architecture and Protocols Transport Layer Instructor: Dr. Mohamed Hefeeda.

37

TCP Congestion Control

Sender keeps a new variable, Congestion Window (CongWin), and limits unacked bytes to:

LastByteSent - LastByteAcked min {CongWin, RcvWin}

For our discussion: assume RcvWin is large enough

Roughly, what is the sending rate as a function of CongWin? Ignore loss and transmission delay

Rate = CongWin/RTT (bytes/sec)

So, rate and CongWin are somewhat synonymous

Page 38: 1 School of Computing Science Simon Fraser University CMPT 771/471: Internet Architecture and Protocols Transport Layer Instructor: Dr. Mohamed Hefeeda.

38

TCP Congestion Control

Congestion occurs at routers (inside the network) Routers do not provide any feedback to TCP

How can TCP infer congestion? From its symptoms: timeout or duplicate acks Define loss event ≡ timeout or 3 duplicate acks TCP decreases its CongWin (rate) after a loss event

TCP Congestion Control Algorithm: three components AIMD: additive increase, multiplicative decrease slow start Reaction to timeout events

Page 39: 1 School of Computing Science Simon Fraser University CMPT 771/471: Internet Architecture and Protocols Transport Layer Instructor: Dr. Mohamed Hefeeda.

39

AIMD

additive increase: (congestion avoidance phase) increase CongWin by 1 MSS every RTT until loss detected TCP increases CongWin by: MSS x (MSS/CongWin) for every

ACK received Ex. MSS = 1,460 bytes and CongWin = 14,600 bytes With every ACK, CongWin is increased by 146 bytes

multiplicative decrease: cut CongWin in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Con

gWin

Page 40: 1 School of Computing Science Simon Fraser University CMPT 771/471: Internet Architecture and Protocols Transport Layer Instructor: Dr. Mohamed Hefeeda.

40

TCP Slow Start

When connection begins, CongWin = 1 MSS Example: MSS = 500 bytes & RTT = 200 msec initial rate = CongWin/RTT = 20 kbps

available bandwidth may be >> MSS/RTT desirable to quickly ramp up to respectable rate

Slow start: When connection begins, increase rate exponentially fast

until first loss event. How can we do that? double CongWin every RTT. How? Increment CongWin by 1 MSS for every ACK received

Page 41: 1 School of Computing Science Simon Fraser University CMPT 771/471: Internet Architecture and Protocols Transport Layer Instructor: Dr. Mohamed Hefeeda.

41

TCP Slow Start (cont’d)

Increment CongWin by 1 MSS for every ACK

Summary: initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Page 42: 1 School of Computing Science Simon Fraser University CMPT 771/471: Internet Architecture and Protocols Transport Layer Instructor: Dr. Mohamed Hefeeda.

42

Reaction to a Loss event

TCP Tahoe (Old) Threshold = CongWin / 2 Set CongWin = 1 Slow start till threshold Then Additive Increase // congestion avoidance

TCP Reno (most current TCP implementations) If 3 dup acks // fast retransmit

• Threshold = CongWin / 2• Set CongWin = Threshold // fast recovery • Additive Increase

Else // timeout• Same as TCP Tahoe

Page 43: 1 School of Computing Science Simon Fraser University CMPT 771/471: Internet Architecture and Protocols Transport Layer Instructor: Dr. Mohamed Hefeeda.

43

Reaction to a Loss event (cont’d)

Why differentiate between 3 dup acks and timeout? 3 dup ACKs indicate network capable of

delivering some segments timeout indicates a “more alarming” congestion scenario

3 dup acks

Page 44: 1 School of Computing Science Simon Fraser University CMPT 771/471: Internet Architecture and Protocols Transport Layer Instructor: Dr. Mohamed Hefeeda.

44

TCP Congestion Control: Summary

Initially

Threshold is set to large value (65 Kbytes), has no effect

CongWin = 1 MSS

Slow Start (SS): CongWin grows exponentially

till a loss event occurs (timeout or 3 dup ack) or reaches Threshold

Congestion Avoidance (CA): CongWin grows linearly

3 duplicate ACK occurs:

Threshold = CongWin/2; CongWin = Threshold; CA

Timeout occurs:

Threshold = CongWin/2; CongWin = 1 MSS; SS till Threshold

Page 45: 1 School of Computing Science Simon Fraser University CMPT 771/471: Internet Architecture and Protocols Transport Layer Instructor: Dr. Mohamed Hefeeda.

45

TCP throughput

What’s the average throughout of TCP as a function of window size and RTT?

Ignore slow start

Let W be the window size when loss occurs When window is W, throughput is W/RTT Just after loss, window drops to W/2,

throughput to W/2RTT

Average throughout: 0.75 W/RTT