TDC561 Network Programming

29
TDC561 Network Programming Camelia Zlatea, PhD Email: czlatea@ cs . depaul . edu Week 10: Performance Aspects of End-to-End (Transport) Protocols and API Programming

description

TDC561 Network Programming. Week 10: Performance Aspects of End-to-End (Transport) Protocols and API Programming. Camelia Zlatea, PhD Email: [email protected]. End-to-End (Transport) Protocols. Underlying best-effort network drops messages re-orders messages - PowerPoint PPT Presentation

Transcript of TDC561 Network Programming

Page 1: TDC561  Network Programming

TDC561 Network Programming

Camelia Zlatea, PhD

Email: [email protected]

Week 10:

Performance Aspects of End-to-End (Transport) Protocols and API Programming

Page 2: TDC561  Network Programming

Page 2Network Programming (TDC561) Winter 2003

Underlying best-effort network– drops messages– re-orders messages– delivers duplicate copies of a given message– limits messages to some finite size– delivers messages after an arbitrarily long delay

Common end-to-end services– guarantee message delivery– deliver messages in the same order they are sent– deliver at most one copy of each message– support arbitrarily large messages– support synchronization– allow the receiver to apply flow control to the sender– support multiple application processes on each host

End-to-End (Transport) Protocols

Page 3: TDC561  Network Programming

Page 3Network Programming (TDC561) Winter 2003

Simple Demultiplexor (UDP)

Unreliable and unordered datagram service Adds multiplexing No flow control Endpoints identified by ports

– servers have well-known ports– see /etc/services on Unix

Optional checksum– pseudo header + udp header + data

Header formatSrcPort DstPort

Checksum LengthData

0 16 31

Page 4: TDC561  Network Programming

Page 4Network Programming (TDC561) Winter 2003

Applicationprocess

Applicationprocess

Applicationprocess

UDP

Packets arrive

Ports

Queues

Packetsdemultiplexed

Simple Demultiplexor (UDP)

Page 5: TDC561  Network Programming

Page 5Network Programming (TDC561) Winter 2003

Reliable Byte-Stream (TCP)

Connection-oriented Byte-stream

– sending process writes some number of bytes

– TCP breaks into segments and sends via IP

– receiving process reads some number of bytes

Full duplex Flow control: keep sender from overrunning receiver Congestion control: keep sender from overrunning

network

Page 6: TDC561  Network Programming

Page 6Network Programming (TDC561) Winter 2003

Application process

W rite

bytes

TCP

Send buffer

Segment Segment Segment

Transmit segments

Application process

Read

bytes

TCP

Receive buffer

… …

Reliable Byte-Stream (TCP)

Page 7: TDC561  Network Programming

Page 7Network Programming (TDC561) Winter 2003

End-to-End Issues

Based on sliding window protocol used at data link level, but the situation is very different. Potentially connects many different hosts

– need explicit connection establishment and termination

Potentially different RTT (Round Trip Time)– need adaptive timeout mechanism

Potentially long delay in network– need to be prepared for arrival of very old packets

Potentially different capacity at destination– need to accommodate different amounts of buffering

Potentially different network capacity– need to be prepared for network congestion

Page 8: TDC561  Network Programming

Page 8Network Programming (TDC561) Winter 2003

Segment Format

Each connection identified with 4-tuple:– <SrcPort, SrcIPAddr, DstPort,

DstIPAddr>

Sliding window + flow control– Acknowledgment, SequenceNum,

AdvertisedWindow

Flags: SYN, FIN, RESET, PUSH, URG, ACK

Checksum: pseudo header + tcp header + data

Options (variable)

Data

Checksum

SrcPort DstPort

HdrLen 0 Flags

UrgPtr

AdvertisedWindow

SequenceNum

Acknowledgment

0 4 10 16 31

Sender

Data (SequenceNum)

Acknowledgment +AdvertisedWindow

Receiver

Page 9: TDC561  Network Programming

Page 9Network Programming (TDC561) Winter 2003

TCP Sliding Window Revisited

Sending application

LastByteWritten

TCP

LastByteSentLastByteAcked

Receiving application

LastByteRead

TCP

LastByteRcvdNextByteExpected

(a) (b)

Relationship between TCP send buffer (a) and receive buffer (b)

Each byte has a sequence number ACKs are cumulative

Page 10: TDC561  Network Programming

Page 10Network Programming (TDC561) Winter 2003

Sending side– LastByteAcked <= LastByteSent– LastByteSent <= LastByteWritten– bytes between LastByteAcked and LastByteWritten must be

buffered Receiving side

– LastByteRead < NextByteExpected– NextByteExpected <= LastByteRcvd + 1– bytes between NextByteRead and LastByteRcvd must be buffered

TCP Sliding Window Revisited

Page 11: TDC561  Network Programming

Page 11Network Programming (TDC561) Winter 2003

Flow Control

Sender buffer size: MaxSendBuffer Receive buffer size: MaxRcvBuffer Receiving side

– LastByteRcvd - NextByteRead Š MaxRcvBuffer– AdvertisedWindow = MaxRcvBuffer - (LastByteRcvd -

NextByteRead)

Sending side– NextByteExpected Š LastByteRcvd + 1– LastByteSent - LastByteAcked Š AdvertisedWindow– EffectiveWindow = AdvertisedWindow - (LastByteSent -

LastByteAcked) – LastByteWritten - LastByteAcked Š MaxSendBuffer– block sender if (LastByteWritten - LastByteAcked) + y >

MaxSendBuffer

Always send ACK in response to an arriving data segment

Persist when AdvertisedWindow=0

Page 12: TDC561  Network Programming

Page 12Network Programming (TDC561) Winter 2003

Bandwidth

Delay/Latency

Network as a Pipe

Page 13: TDC561  Network Programming

Page 13Network Programming (TDC561) Winter 2003

Keeping the Pipe Full

TCP Correctness and Performance Aspect– Size of the SequenceNum and AdvertizedWindow affects the

correctness and performance of TCP– Wrap Around: 32-bit SequenceNum– Bandwidth & Time Until Wrap Around

BandwidthT1 (1.5Mbps)Ethernet (10Mbps)T3 (45Mbps)FDDI (100Mbps)STS-3 (155Mbps)STS-12 (622Mbps)STS-24 (1.2Gbps)

Time Until Wrap Around6.4 hours57 minutes13 minutes6 minutes4 minutes55 seconds28 seconds

Page 14: TDC561  Network Programming

Page 14Network Programming (TDC561) Winter 2003

TCP Correctness and Performance Aspect– Bytes in Transit: 16-bit AdvertisedWindow ( up to 64KBytes)– AdvertisedWindow large enough to allow sender to keep the pipe full (Delay x Bandwidth) Product– Bandwidth and (Delay x Bandwidth) Product dictates how big AdvertisedWindow needs to be– Required window size for 100ms RTT

TCP protocol extensionsBandwidth

T1 (1.5Mbps)Ethernet (10Mbps)T3 (45Mbps)FDDI (100Mbps)STS-3 (155Mbps)STS-12 (622Mbps)STS-24 (1.2Gbps)

Delay x Bandwidth Product

18KB122KB549KB1.2MB1.8MB7.4MB14.8MB

Keeping the Pipe Full

Page 15: TDC561  Network Programming

Page 15Network Programming (TDC561) Winter 2003

Separate the implementation of protocols from the interface they export.

Important at the transport layer since this defines the point where application programs typically access the network.

This interface is often called the application programming interface, or API.

Notes The API is usually defined by the OS. Example API: sockets Defined by BSD Unix, but ported to other systems

Application Programming Interface

API

Application

Transport protocol

Page 16: TDC561  Network Programming

Page 16Network Programming (TDC561) Winter 2003

Socket Operations Creating a socket

– int socket(int domain, int type, int protocol)– domain=AF_INET, PF_UNIX– type=SOCK_STREAM, SOCK_DGRAM

Passive open on serverint bind(int socket, struct sockaddr *address, int addr_len)int listen(int socket, int backlog)int accept(int socket, struct sockaddr *address, int *addr_len)

Active open on clientint connect(int socket, struct sockaddr *address,

int addr_len) Sending and receiving messages

int write(int socket, char *message, int msg_len, int flags) int read(int socket, char *buffer, int buf_len,

int flags)

Page 17: TDC561  Network Programming

Page 17Network Programming (TDC561) Winter 2003

Performance

Page 18: TDC561  Network Programming

Page 18Network Programming (TDC561) Winter 2003

Performance Overview

Bandwidth– a measure of of the width of a frequency band

» voice grade telephone line supports a frequency band ranging from 300MHz to 3300MHz ; it is said to have”a bandwidth” of 3000MHZ; when given in Hz, it probably refers to the range of signals that can be accommodated.

– Bandwidth of a communication link = number of bits per second that can be transmitted on that link.» ETH bandwidth is 10Mbps» available bandwidth» Measured bandwidth = number of bits per second that can be actually

transmitted on that link, in practice» Throughput = Measured Performance of a system

• A pair of nodes connected by a link with a bandwidth of 10Mbs might achieve a throughput of 2Mbps; an application on one host could sedn data to the other host at 2Mbps.

– Bandwidth requirements for an application» Number of bits per second that it needs to transmit over the network to perform

acceptably

Page 19: TDC561  Network Programming

Page 19Network Programming (TDC561) Winter 2003

10,000

5000

2000

1000

500

200

100

50

20

10

5

2

1

10010RTT (ms)

1-MB object, 1.5-Mbps link1-MB object, 10-Mbps link2-KB object, 1.5-Mbps link2-KB object, 10-Mbps link1-byte object, 1.5-Mbps link

1-byte object, 10-Mbps linkPer

ceiv

ed l

ate

nc

y (m

s)

Latency (Response Time, delay) vs. RTT

Latency = Propagation + Transmit + Queue

Page 20: TDC561  Network Programming

Page 20Network Programming (TDC561) Winter 2003

Source1-Mbps crosscountry link

Destination.1 Mb

.1 Mb

.1 Mb

.1 Mb

(a) 84 pipes full of data = 8.4Mb file

(b) 1/12 of one pipe full of data = 8.4Mb file

Source1-Gbps crosscountry link

Destination8.4

Mb

Performance Overview

Latency and Bandwidth

Page 21: TDC561  Network Programming

Page 21Network Programming (TDC561) Winter 2003

Performance Overview

1Mbps and 1Gbps links have the same latency– limited by the speed of light

To transfer a 1MB file takes...– 100ms RTTs on a 1Mbps link

– doesn't fill a 1Gbps link (12.5MB delay x bandwidth)

In other words:– 1MB file is to 1Gbps network what 1KB packet is to 1Mbps

network

Bandwidth

Delay/Latency

Page 22: TDC561  Network Programming

Page 22Network Programming (TDC561) Winter 2003

Latency/Bandwidth Tradeoff

Throughput = TransferSize / TransferTime

– if it takes 10ms to transfer 1MB, then the effective throughput is 1MB/10ms = 100MBps = 800Mbps

TransferTime = Latency + 1/Bandwidth x TransferSize

– if network bandwidth is 1Gbps (it takes 1/1Gbps x 1MB = 0.8ms to transmit 1MB), an end-to-end transfer that requires 1 RTT of 100ms has a total transfer time of 100.8ms

– effective throughput is 1MB/100.8ms = 79.4Mbps, not 1Gbps

Page 23: TDC561  Network Programming

Page 23Network Programming (TDC561) Winter 2003

Round-Trip Latency (s)

Per-Layer Latency (1 byte latency) ETH + wire: 216s UDP/IP: 58s

Message size (bytes)11002003004005006007008009001000

UDP297413572732898106712261386155117191878

TCP3655196918531016118513541514167618452015

IP

PHY

ETH

App2

UDP

App1

TCP

Page 24: TDC561  Network Programming

Page 24Network Programming (TDC561) Winter 2003

Throughput (UDP/IP/ETH)

10

9.6

9.2

9

8.6

1 2 4 8 16 32

Th

rou

gh

pu

t (M

bp

s)

Message size (KB)

9.8

9.4

8.8

8

8.2

8.4

Throughput improves as the message get larger, up to a limit when per-message overhead becomes insignificant = message overhead/number of bytes = ~16KB

Flattens at ~9.5Mbps < ETH 10Mbs

Page 25: TDC561  Network Programming

Page 25Network Programming (TDC561) Winter 2003

Notes– transferring a large amount of data helps improve the effective

throughput; in the limit, an infinitely large transfer size causes the effective throughput to approach the network bandwidth

– having to endure more than one RTT will hurt the effective throughput for any transfer of finite size, and will be most noticeable for small transfers

Page 26: TDC561  Network Programming

Page 26Network Programming (TDC561) Winter 2003

Implications Congestion control

– feedback based mechanisms require an RTT to adjust– can send 10MB in one 100ms RTT on a 1Gbps network– that 10MB might congest a router and lead to massive losses– can lose half a delay x bandwidth's of data during slow start– reservations work for continuous streams (e.g., video), but

require an extra RTT for bulk transfers Retransmissions

– retransmitting a packet costs 1 RTT– dropping even one packet (cell) halves effective bandwidth– retransmission also implies buffering at the sender– possible solution: forward error correction (FEC)

Trading bandwidth for latency– each RTT is precious– willing to “waste” bandwidth to save latency– example: pre-fetching

Page 27: TDC561  Network Programming

Page 27Network Programming (TDC561) Winter 2003

Host Memory Bottleneck

Issue– turning host-to-host bandwidth into application-

to-application bandwidth

– have to deliver data across I/O and memory buses into cache and registers

Page 28: TDC561  Network Programming

Page 28Network Programming (TDC561) Winter 2003

Memory bandwidth– I/O bus must keep up with network speed

(currently does for STS-12, assuming peak rate is achievable)

– 114MBps (measured number) is only slightly faster than I/O bus; can't afford to go across memory bus twice

– caches are of questionable value (rather small)– lots of reason to access buffers

» user/kernel boundary» certain protocols (reassembly, check-summing)» network device and its driver

Same latency/bandwidth problems as high-speed networks

Page 29: TDC561  Network Programming

Page 29Network Programming (TDC561) Winter 2003

Integrated Services

High-speed networks have enabled new applications– they also need “deliver on time” assurances from the network

Applications that are sensitive to the timeliness of data are called real-time applications– voice

– video

– industrial control

Timeliness guarantees must come from inside the network– end-hosts cannot correct late packets like they can correct for

lost packets

Need more than best-effort– IETF is standardizing extensions to best-effort model