Layering and TCP/IP Stack -...

83
Layering and TCP/IP Stack Nicolas Montavont [email protected] Universidad de los Andes Merida, Venezuela May 2011 ULA - May 2011 page TCP / IP network About the slides ! These slides are part of the module on Voice over IP at Universidad de Los Andes - Venezuela ! Many thanks to (alphabetic order) Annie Gravey (Telecom Bretagne) David Ros (Telecom Bretagne) Emil Ivov (jitsi) G6 - the french IPv6 task force German Castignani (Telecom Bretagne) Gilbert Martineau (Telecom Bretagne Kurose and Ross - Computer Network, a top down approach Laurent Toutain (Telecom Bretagne) Xavier Lagrange (Telecom Bretagne) 2

Transcript of Layering and TCP/IP Stack -...

Layering and TCP/IP Stack

Nicolas [email protected]

Universidad de los Andes

Merida, Venezuela

May 2011

ULA - May 2011page TCP / IP network

About the slides

!These slides are part of the module on Voice over IP at Universidad de Los Andes - Venezuela

!Many thanks to (alphabetic order)• Annie Gravey (Telecom Bretagne)

• David Ros (Telecom Bretagne)

• Emil Ivov (jitsi)

• G6 - the french IPv6 task force

• German Castignani (Telecom Bretagne)

• Gilbert Martineau (Telecom Bretagne

• Kurose and Ross - Computer Network, a top down approach

• Laurent Toutain (Telecom Bretagne)

• Xavier Lagrange (Telecom Bretagne)

2

ULA - May 2011page TCP / IP network

Outline

! Introduction by an example: an HTTP session! Why layering?

• Layers and modelling

• Encapsulation principle

! Standardization• Overview of the IETF - the engineering Task Force

! The IP layer• Header

• Discussion on address allocation

! Address configuration• IPv4: DHCP, ARP

• IPv6: Neighbor Discovery

! Transport level• UDP

• TCP

3

ULA - May 2011page TCP / IP network

What’s a protocol?

a human protocol and a computer network protocol:

4

Hi

Hi

Got the

time?2:00

TCP connection

request

TCP connection

responseGet http://www.awl.com/kurose-ross

<file>

time

ULA - May 2011page TCP / IP network

An example: http

5

SignalingEstablish a TCP connection

ULA - May 2011page TCP / IP network

An example: http

6

Can I gethttp://www.rfc-editor.org/rfc/rfc3261.txt?

ULA - May 2011page TCP / IP network

http request

7

http get http://www.rfc-editor.org/rfc/rfc2616.txt

ULA - May 2011page TCP / IP network

http request

8

http get http://www.rfc-editor.org/rfc/rfc2616.txt

ULA - May 2011page TCP / IP network

http 200 OK

9

Here it is

ULA - May 2011page TCP / IP network

http Reply

10

http 200 OK

ULA - May 2011page TCP / IP network

http reply

11

http 200 OK

Protocol layers

ULA - May 2011page TCP / IP network

Protocol “layers”

!Networks are complex - many pieces

• Hosts

• Routers

• Links of various media

• Applications

• Protocols

• Hardware, software

How can we organize the structure of a network?

13

ULA - May 2011page TCP / IP network

Why layering?

!Dealing with complex systems

• Explicit structure allows identification, relationship of complex system’s pieces- Layered reference model for discussion

• Modularization eases maintenance, updating system- Change of implementation of layer’s service transparent to the rest of the

system- e.g., change in gate prcedire does not affect rest of the system

!But... keep in mind that

• It is difficult to draw a line between layers, protocols

• Sometime we still need more information from a layer

• Sometime we will duplicate operations

14

ULA - May 2011page TCP / IP network

Internet protocol stack

!Application: supporting network applications

• FTP, SMTP, HTTP

!Transport: process-process data transfer

• TCP, UDP

!Network: routing of datagrams from source to destination

!Link: data transfer between neighboring network elements

• ppp, Ethernet

!Physical: bits “on the wire”

15

Application

Transport

Network

Link

Physical

ULA - May 2011page TCP / IP network16

Protocols stack for the Internet (subset)

IEEE 802.3 (+ LLC + SNAP)Ethernet

PPP

IPARP ICMP

UDP TCP

! The border between layers is not easy to draw! Some protocols are not easy to place (ex. ICMP, ARP)

application applicationapplication application

Transport

Network

Link

Application

Physical

ULA - May 2011page TCP / IP network17

Application layer protocols

Routing protocols

IEEE 802.3 (+ LLC + SNAP)Ethernet

PPP

IPARP ICMP

UDP TCP

OSPF pingRIP http

Transport

Network

Link

Application

Physical

ULA - May 2011page TCP / IP network18

IP network: encapsulation

Application

TCP

IP

EthernetDriver

PhysicalExample: TCP/IP in a local Ethernet network

Data chunkTCP headerIP header

Data chunkTCP header

Data chunkTCP headerIP headerEthernet

header

CRC

Ethernet

Data chunkApplication

data

TCP segment

IP packet

Ethernet frame

ULA - May 2011page TCP / IP network19

IP Network: demultiplexing

Ethernet

IP ARP

ICMP

UDP

RIP

TCP

http

! Question: How to move up the data in the protocol stack?

! Answer: headers include the information

Ethernet header: Type

IP header:Protocol field

TCP or UDP header: source and destination port number fields

617

1

0x800 0x806

80520

Example: TCP/ IP in a local Ethernet network

ULA - May 2011page TCP / IP network20

IP network: heterogeneous networks interconnexion

IP

TCP

Webserver

IP

TCP

Webclient

PPP

IP

Router (node)

End host End host

TCP protocol (end-to-end)

http (application layer)

IP Protocol IP Protocol

Ethernet

Protocol PPPEthernet Ether. PPP

serial linkLAN - Ethernet

ULA - May 2011page TCP / IP network21

The Internet Protocol

!Two incompatible versions• IPv4: most used version today

• IPv6: built on the IPv4 experience

!A datagram service• Route the packets from a source to a destination

• Each packet includes the complete address of the destination

• Non-connection oriented service

- Each packet is independently handled

– Two consecutive packets may use different

routes

ULA - May 2011page TCP / IP network22

What does IP offer to upper layers?

!A datagram service

• Non reliable- Packet loss- Duplication- Sequence may not be respected

• Best-effort service

ULA - May 2011page TCP / IP network23

IP networks interconnexion

A

B

AS 1AS 2

!Addresses have a global signification (IPv4)• There should not be duplicated addresses (except NAT of course)

! IP packet forwarding• Routers use the addresses within the packets

• Routing table: what is the next hop - next router to reach the destination

Standardization... The IETF

ULA - May 2011page TCP / IP network

Few standardization organisms

! ITU-T

• International standardization body - T stands for Telecommunication

• Give the rough guidelines

!ETSI - ANSI

• Regional standardization bodies, adapting the standards form ITU-T to the specific region

• May now by leader in proposal (e.g., GSM)

! IEEE

• Specify the physical and link layers for various technologies - 802.3, Bluetooth, Wifi, Wimax

! IETF

• The IP world- Network, transport and application layers

25

ULA - May 2011page TCP / IP network26

IP networks and Internet: standardization

IESG (Internet Engineering Steering Group)

IETF (Internet Engineering Task Force)

ISOC (Internet Society)

IAB (Internet Architecture Board)

IRSG (Internet Research Steering Group)

IRTF (Internet Research Task Force)

Area

WG

WG

WG

...RG

Area

WG

WG

WG

Research group

RGetc. ...

IANA (Internet Assigned Numbers Authority)

ULA - May 2011page TCP / IP network27

The IETF Internet Engineering Task Force

!Organized into areas

• Ex. : Transport - Routing

!Each area is organized into Working Groups

• Ex. : sip - mext - roll

!Principle: « rough consensus and running code »

!Opened to anyone• 3 f2f meeting per year...

• ... but everything done on mailing lists

ULA - May 2011page TCP / IP network28

!Free access to documents

• http://www.ietf.org/

!Documents

• Standards : Request For Comments- Ex. : RFC 793 (TCP)- Permanent

• Working documents: Internet Draft- Ex. : draft-ietf-tsvwg-tcp-eifel-alg-07.txt (Eifel algorithm for TCP)- Last only 6 months

The IETF Internet Engineering Task Force

ULA - May 2011page TCP / IP network29

!Standardization process• Standards track :

- Submission of a personal draft:draft-untel-mon-sujet-favori-00.txt

–E-mail at : [email protected]

- Discussion within mailing lists and at the IETF meetings–Try to make the draft become a working group item:draft-ietf-xxxwg-my-subjet-00.txt

–Reach a consensus on the Mailing list (known as Last Call)–Give the document to an Area Director–Last call in all groups–If accepted: send it to RFC Editor (and IANA, if values need

to be allocated)- RFC : proposed standard, then draft standard et finally standard

The IETF Internet Engineering Task Force

ULA - May 2011page TCP / IP network30

RFCs

!Different classes of RFC :• Documents coming from the standardization process (proposed

standard, draft standard, standard)

• Other- Experimental- BCP (Best Current Practice)- Informational- ...

• Be careful to the date!- RFC 1149 (April 1st 1990) : A Standard for the Transmission of IP Datagrams on

Avian Carriers- RFC 2549 (April 1st 1999) : IP over Avian Carriers with Quality of Service- RFC 3514 (April 1st 2003) : the evil bit in the IP header

ULA - May 2011page TCP / IP network31

Standards evolution

!In general, one protocol ! one RFC

• Example : TCP- RFC 793 (first specification)- RFC 1122 (Requirements for Internet Hosts)- RFC 1323 (Extensions for High Performance)- RFC 2018, 2883 (Selective Acknowledgment)- RFC 2581 (Congestion Control)- RFC 2988 (Retransmission Timer)- etc. etc. ...

ULA - May 2011page TCP / IP network32

Research activities: IRTF

!http://www.irtf.org/

!Foresee at a much longer term!Sometimes groups are private

• Decided by the chair

• Mailing lists are public

!Some examples• End-to-end (closed today)

• Anti-spam

• Virtual networks

• DTN (Delay/Disruption Tolerant Network)

!Nowadays, research is made through scientific publication

Ethernet and 802.3

ULA - May 2011page TCP / IP network34

Expand the link layer

Application

Transport

Network

Link

Physical

LLC (Logical Link Control)

Error control, flow control

MAC (Medium Access Control)

Share the link, addressing

{

ULA - May 2011page TCP / IP network35

Local Area Network

!Media shared by several devices• Operation mode: broadcast and define how to access

- No centralized operation

!If the media is share, we need• To be able to identify a device ! addresses

• Rules to access the medium

!Some properties• No configuration required to operate

• Need not be scalable

• Addresses still need to be unique

ULA - May 2011page TCP / IP network36

Architecture

ULA - May 2011page TCP / IP network37

IEEE 802.3 and Ethernet

!Ethernet vs. IEEE 802.3 :• Same physical layer

• Same medium access method

• MAC frame:- Same format...- ...but different usage of one field

! Two non-compatible protocols

ULA - May 2011page TCP / IP network38

IEEE addressing

!Made of 2 parts

• Manufacturer part (named OUI code), bought to IEEE by a manufacturer, guarantee the uniqueness

• Identification part (serial number)- For a given manufacturer, must be unique

!The MAC address format is the same whatever the protocol (Ethernet, Wifi, ...)

• Ease network interconnection

ULA - May 2011page TCP / IP network39

Format des adresses MAC (norme IEEE 802.1)

! Unique address (worldwide): more or less 1014 different address

! Access all OUI code: http://standards.ieee.org/regauth/oui/index.shtml

I/G U/L 46 bits

0 = I : individual address1 = G : group address

0 = U : universal address1 = L : local address

0 U/L 46 bits

3 bytes: manufacturer code (OUI) 3 bytes: serial number

ULA - May 2011page TCP / IP network40

MAC address format (802.1)

OUI code on 3 bytes, hexadécimal Manufacturer

00-00-0C Cisco

00-03-93 Apple

02-80-8C 3Com

08-00-20 Sun

08-00-5A IBM

OUI codeOUI code

!Three addresses family, three operating modes• Point-to-point (unicast) : one device

• Broadcast: designate all equipments on the network (FF-FF-FF-FF-FF-FF)

• Restricted broadcast (multicast) : designate a subset of all equipments (first bit of address equal to 1)

ULA - May 2011page TCP / IP network41

Trame de D à B

Unicast

!D send the message on the media in broadcast!All stations network interfaces receive the message!Only the interface having the destination MAC address configured

will forward the message up to the stack• Layer 2 filtering: done by the card

ULA - May 2011page TCP / IP network42

Access mode: CSMA/CD

!Carrier Sense Multiple Access / Collision Detect• CSMA : before transmitting, the sender probes the channel to

detect current transmission

• CD : the sender checks wether someone else is also sending at the same time (= collision)

!Collision = bad reception• Frames need to be transmitted again

ULA - May 2011page TCP / IP network43

CSMA/CD : simplified algorithm

1. If the channel is free, then send the frame2. If the channel is busy, wait for it to be idle, and

then send when it’s idle3. If a collision occurs

a.Stop the transmission

b.Wait a random time and go back 1

ULA - May 2011page TCP / IP network44

Transmission on an idle media

A probes the channel:channel free ! sending

ULA - May 2011page TCP / IP network45

Sending on a busy channel

B probes the channel:channel busy ! wait ...

... until the channel is idle

A probes the channel:channel free ! sending

ULA - May 2011page TCP / IP network46

A probes the channel: it’s idle, let’s send

propagation delay

B probes the channel : detects a signal, wait

station A

station B Time

collision

B detects that the channel is idle ! send the frame

!Collision because channel propagation...

A detects a collision ! stops transmitting

ULA - May 2011page TCP / IP network47

Frame size

!We need to have upper bound and lower bound frame size

!Maximum size

• Goal: avoid a station to use the channel too long

• Fixed to 1518 bytes

!Minimum size

• Goal: help in detecting collision (see next slide)

• Fixed to 64 bytes

ULA - May 2011page TCP / IP network48

If frames were too small

A

B

tim

e

collision

max. propagation delay = "

sending time < 2 "

= 2 "

C

D

A et B : stations that are far away

In this example, A et B successfully

transmitted their messages, but:

• A and B do not detect the collision

• C receives correctly the frame from

A, but not from B

• D receives correctly the frame from

B, but not from A

By adding padding (additional bits at

the end of the frame), in order to make

the sending time at least two times the

propagation delay, we avoid this

problem

ULA - May 2011page TCP / IP network49

Si la taille minimale des trames est bornée...

tem

ps

collision détectée par A2 "

Exemple avec durée minimale de trame = 2 " + #

A

B

C

D

Envoi de données de brouillage (renforçant la collision)pendant un temps $ < #

" 2 " + $

Dans cet exemple, la durée minimale d’émission est > 2 fois le délai de propagation

! A voit le canal libre et commence à émettre

! B voit le canal libre et commence à émettre

! B se rend compte presque immédiatement de la collision

! B poursuit la transmission pendant quelques instants, afin que la collision soit bien décelable par les autres équipements

ULA - May 2011page TCP / IP network

Frame format

50

ULA - May 2011page TCP / IP network51

Encapsulation at the physical layer

7 bytes 1 byte from 64 to 1518 bytes

Preambule Starting of a frame MAC data

7 % (101010102) = 101010112

next frame

Preambule

Minimum inter-frames silence (IFS)

!Preambule: allow the receiver to get synchronised (101010102 = squared signal in Manchester coding)

! Inter-frame silence: allows to separate two successive frames• 802.3 / Ethernet at 10 Mbit/s : IFS = 9,6 !s

ULA - May 2011page TCP / IP network52

Frame format

6 bytes 6 bytes 2 bytes & 0 byte & 0 bytes 4 bytes

Destination

address

Source

addresshigher layer

protocol numberupper layer data Padding CRC

6 bytes 6 bytes 2 bytes & 0 bytes & 0 bytes 4 bytes

Destination

address

Source

addressSize LLC data LLC data Padding CRC

Ethernet

IEEE 802.3

from 46 to 1500 bytes

used for the CRC

ULA - May 2011page TCP / IP network53

How to distinguish 802.3 from Ethernet?

Ethernet

IEEE 802.3

de 46 à 1500 octets

Value coded in the field protocol type / length

if " 1500 = 0x5DC, then 802.3

if > 1500 = 0x5DC, then Ethernet (protocols codes are always > 1500)

6 bytes 6 bytes 2 bytes & 0 byte & 0 bytes 4 bytes

Destination

address

Source

addresshigher layer

protocol numberupper layer data Padding CRC

6 bytes 6 bytes 2 bytes & 0 bytes & 0 bytes 4 bytes

Destination

address

Source

addressSize LLC data LLC data Padding CRC

ULA - May 2011page TCP / IP network54

Ethernet and the Network layer

Ethernet

!At the MAC layer, we don’t know!• We send to the upper layer, the padding bytes ! and what about the layer

architecture!!!

• PDU at the upper layer MUST have a Length field

Is this only data, or data and padding?

6 bytes 6 bytes 2 bytes & 0 byte & 0 bytes 4 bytes

Destination

address

Source

addresshigher layer

protocol numberupper layer data Padding CRC

ULA - May 2011page TCP / IP network55

Missing...

!More details on CSMA/CD and backoff algorithm!LLC - Logical Link Control!Type of cable and bandwidth!Switch!Bridge - spanning tree algorithm!Virtual LAN (VLAN)

The IP layer

ULA - May 2011page TCP / IP network

57

IPv4 header format

1873

Data (from the upper layer)

Options (if any)

IP destination address

IP source address

header ChecksumProtocolTime to Live (TTL)

FragmentFlagsIdentification

Total length (in bytes)Type of serviceheader length

Version= 4

31191615840

20bytes

32 bits

Padding

ULA - May 2011page TCP / IP network58

Length fields

!header length en-tête (4 bits) : in 32 bits words• Maximal size (with options) = 60 bytes ! max. 40 bytes for options

!Total length (16 bits)• Theoretical maximum = 216 – 1

• IP over Ethernet : needed to distinguish the information from padding

& 0 bytes & 0 bytes

Destination

address

Source

address

protocol = IP

(0x800)IP packet padding CRC

sent to the upper layer

Ethernet frame

ULA - May 2011page TCP / IP network59

Type of service (TOS)

!Current definition (RFC 2474, 3168) : differentiation of service (DiffServ) + congestion notification (ECN)

• DSCP field :- 6 values for Class Selector (compatible with the previous Priority field),

- 12 values for Assured Forwarding

- 1 value for Expedited Forwarding

0 1 2 3 4 5 6 7

DiffServ Code Point (DSCP)DiffServ Code Point (DSCP)DiffServ Code Point (DSCP)DiffServ Code Point (DSCP)DiffServ Code Point (DSCP)DiffServ Code Point (DSCP) ECNECN

ULA - May 2011page TCP / IP network60

Fragmentation

!If the link MTU does not allow transporting the packet• Send the packet in fragments

!Reassembling is performed by the destination!Costly for routers

Dessin : [Tanenbaum, 2002]

ULA - May 2011page TCP / IP network61

Fragmentation: header fields

! Identification: unique number (for the sender)

• If the packet is fragmented again, all fragments will still have this number

! Position of the fragment: position of 1st byte of the fragment in the original datagram

• Cut of multiple of 8 bytes

! DF (don’t fragment) = 1: the packet must not be fragmented

• If a fragmentation is needed: the packet is discarded and an ICMP message is returned to

the source

! MF (more fragments)

• MF = 0 Last fragment

! Flags by default (for a non-fragmented packet) : DF = MF = 0

16 bits 1 bit 1 bit 1 bit 13 bits

Identification 0 DF MF Position of the fragment

ULA - May 2011page TCP / IP network62

Fragmentation: an example

IDDF

MF place

123456 0 0 0 0

E R1 R2 Dmax. data = 4096 max. data = 1024 max. data = 512

123456 0 0 1 0

123456 0 0 0 128

(2021 bytes of data)

(1024 bytes)

(997 bytes)

123456 0 0 1 0

(512 bytes)

123456 0 0 1 64

(512 bytes)

123456 0 0 1 128

(512 bytes)

123456 0 0 0 192

(485 bytes)Identification 0 DF MF Position

ULA - May 2011page TCP / IP network63

Time To Live field (TTL)

!Initialised with a value > 0• Typical value = 64

!-1:• Each time a packet crosses a router

• Once per second, if the packet is waiting reassembling at the destination

A B

R1

R2

R3

Routing loop (when routing tables are

erroneous)

ULA - May 2011page TCP / IP network64

Protocole field

Ethernet / SNAP

IPARP ICMP

UDPTCP

pingDHCP

Transport

Network

Link

Application

type = 0x800type = 0x806

protocol = 1

protocol = 6 protocol = 17

traceroute

ULA - May 2011page TCP / IP network

IP addresses

!Why do we need addresses?

• Is it identification then?

• Location only!

!How many addresses per node? Who get an address??!Possible analogy

• Fixed telephone number? Almost...

65

ULA - May 2011page TCP / IP network66

IP Addresses

!32 bits• Human representation: 4 blocks of 1 byte, separated by a dot

• Decimal representation

! Includes a subnet (network) address and an interface address

• Delimited by the netmask: 131.254.100.48/24

10000011 11111110 01100100 00110000

131 254 100 48. . .

netmask

Prefix Identifier

11111...11111 000...0000

32 bits

variable

ULA - May 2011page TCP / IP network67

IP Adresses: subnets

Router

@ Net1

PC1

PC2

PC3

@ Net2

PC4 PC5

10.1.1.101/24

10.1.1.102/24

10.1.1.103/24

10.1.1.1/24

10.1.2.1/24

10.1.2.101/24 10.1.2.102/24

ULA - May 2011page TCP / IP network68

IP Adresses and Netmask: example

Which part belongs to the network, and which one belongs to the interface

Dessin : [Toutain, 2003]

Network Interface

IP address

ULA - May 2011page TCP / IP network69

Particular IP addresses

!Loopback: no packet are sent on the network!Broadcast: Reach all nodes on the local network

• Matching between IP diffusion and MAC diffusion

From: [Tanenbaum, 2002]

This host

A host in this network

Broadcast on the local network

Broadcast on a distant network

Loopback

Network

Any value

ULA - May 2011page TCP / IP network

What do we use?

70

What do we want to use? What the network is actually using?

Names Addresses

Resolution name system: Domain Name System

Where is ietf.org?

It is at 56.54.29.4

ULA - May 2011page TCP / IP network71

IP addresses: allocation and thoughts

! IP address: unique identification of a global scope

• Addressing: need to be scalable

!32 bits: (in theory) gives 232 # 4 billions different addresses

• Fixed size to simplify routing decision- Address management per packet (datagram!)

!How IP addresses can be allocated??

• If random choice- what about duplication? (address unicity)

- Routing?

• Fixed allocation to nodes- What about nomadism?

• Need a performant allocation system- Finite address space: no waste!

ULA - May 2011page TCP / IP network

IP addresses allocation

!Before 1994

• Classfull addressing

!After 1994

• Classless addressing called CIDR (Classless Interdomain Routing)

!Why do we need always more??*

• Mobile devices

• “Always on connections” - compared to dial up modems

• Internet demographics

• Inefficient address space use

• Virtualization

72

*Source: Wikipedia - http://en.wikipedia.org/wiki/IPv4_address_exhaustion

ULA - May 2011page TCP / IP network73

IP addresses classes (before 1994)

Dessin : [Tanenbaum, 2002]

!126 réseaux classe A de # 16 % 106 machines chacun!Environ 16 % 103 réseaux classe B de # 65 % 103 machines

chacun!Environ 2 % 106 réseaux classe C de # 250 machines chacun

ULA - May 2011page TCP / IP network74

IP addressing (before 1994)

!Flat addressing space• No hierarchical numbering

• Management through a central entity - Network Information Center

• No relation between an address and the geographic position: simplify the administration- 128.92 / 16 = IntelliCorp (US)- 128.93 / 16 = INRIA (France)

!However...

• Very inefficient use of the address space- Choice between a class C (/24 - 254 possible hosts) and class B (/16 - 65 000 possible hosts)

- Depletion of the class B address space...

!Évolution• CIDR - classless addresses

• Adresses privées + NAT

• IPv6 (addresses on 128 bits)

ULA - May 2011page TCP / IP network75

Network Address Translation

! Basics (RFC 3022)

• Share IP address(es) between several users

• The number of IP address is usually smaller than the number of clients- Example: a company network

! Network Address and Port Translation

• Usage of the port number to differentiate users

• Inside the LAN, each device has a (private) different IP address– Typically 10.0.0.0 / 8

• From the outside, only a small number of IP addresses are used– Public (routable) addresses

! Interface between the LAN & Internet = NAT box

• Dynamic conversion of the addresses for incoming and outgoing packets

• Typical implementation (home networking): NAT + firewall (+ router (+ Wifi AP))

ULA - May 2011page TCP / IP network76

NAT: an illustration

Privée source addressPublic source address

Source: [Tanenbaum, 2002]

Privatesource @

Privatesource port

Publicsource port

10.0.1.2 63378 51031

ULA - May 2011page TCP / IP network77

Addresses management - Post-1994

! 1st idea: strict rules for address allocation (RFC 1466)

• Restrict the usage of the class B with specific rules

• Allocate contiguous blocks of class C addresses- So contiguous blocks of class C addresses where allocated to RIR, which

can then allocate smaller contiguous blocks to LIR / ISP)

! 2nd idea: forget the classes! (RFC 1518, 1519)

• CIDR (Classless Inter-Domain Routing)

• Allocate remaining addresses by variable block size, depending on the needs

• Try to get back already allocated prefixes to allocate them again, according to these (new) rules

! De-centralized organization

• IANA manages the global pool of addresses

• Regional centers receive blocks and re-distribute them to ISPs

ULA - May 2011page TCP / IP network78

CIDR : principle

!Address => préfix + prefix length

• No more classes: the first bits of the address do not mean anything from now on

!Administrative instances hierarchy

• Coordination: IANA (Internet Assigned Numbers Authority)

• Regional Internet Registries (RIR) (and then LIR / ISP)

ULA - May 2011page TCP / IP network79

Prefixes allocation

RIPE-NCC

[Toutain, 2003]

62/880/7

193/8194/7

IANA

ISP 162.125/16

ISP 2195.44/14

Site 162.125.44.128/25

Site 262.125.50/24

Site 3195.46.216/21

ISP 3195.47/16

Site 4195.47.172/22

ULA - May 2011page TCP / IP network

Today view of available addresses... none!

80

!Check the daily report• http://www.potaroo.net/tools/ipv4/index.html

Exhaustion of IANA pool: Feb-2011

RIR Unallocated Address

Pool Exhaustion

15-Apr-2011

The end of Internet??

ULA - May 2011page TCP / IP network

Did you forget about IPv6???

81

!A new version of the Internet protocol• Benefit from the long experience of the Internet

!An enabler!2001:660:3003:1:0:0:6543:210F!Longer, much longer addresses: 128 bits

• 3.41038 addresses

• 60 000 trillion trillion addresses per inhabitant of earth

• An address for every grain of sand in the world

So what is IPv6?

ULA - May 2011page TCP / IP network

IPv6 address format

!A 16 bytes Global IPv6 address:

• 2001:0660:3003:0001:0000:0000:6543:210F

• Hexadecimal notation: 2 bytes separated by a semi-column

!Compact form

82

2001:0660:3003:0001:0000:0000:6543:210F

ULA - May 2011page TCP / IP network

IPv6 address format

!A 16 bytes Global IPv6 address:

• 2001:0660:3003:0001:0000:0000:6543:210F

• Hexadecimal notation: 2 bytes separated by a semi-column

!Compact form

!Remove 0 on the left of each word

83

2001:0660:3003:0001:0000:0000:6543:210F

ULA - May 2011page TCP / IP network

IPv6 address format

!A 16 bytes Global IPv6 address:

• 2001:0660:3003:0001:0000:0000:6543:210F

• Hexadecimal notation: 2 bytes separated by a semi-column

!Compact form

!Remove 0 on the left of each word

84

2001:660:3003:1:0:0:6543:210F

ULA - May 2011page TCP / IP network

IPv6 address format

!A 16 bytes Global IPv6 address:

• 2001:0660:3003:0001:0000:0000:6543:210F

• Hexadecimal notation: 2 bytes separated by a semi-column

!Compact form

!Remove 0 on the left of each word!To avoid ambiguity, substitute only one sequence of

zeros by ::

85

2001:660:3003:1:0:0:6543:210F

ULA - May 2011page TCP / IP network

IPv6 address format

!A 16 bytes Global IPv6 address:

• 2001:0660:3003:0001:0000:0000:6543:210F

• Hexadecimal notation: 2 bytes separated by a semi-column

!Compact form

!Remove 0 on the left of each word!To avoid ambiguity, substitute only one sequence of

zeros by ::

• An IPv4 address may also appear like ::FFFF:123.12.34.56

86

2001:660:3003:1::6543:210F

ULA - May 2011page TCP / IP network

IPv6 address scheme

!RFC 4291 defines current IPv6 addresses

• loopback (::1)

• link local (fe80::/10)

• global unicast (2000::/3)

• multicast (ff00::/8)

!Use CIDR principles:

• Prefix / Prefix length

• 2001:660:3003::/48

• 2001:660:3003:2:a00:20ff:fe18:964c/64

!Interfaces have several IPv6 addresses

• At least link local and global unicast addresses

87

ULA - May 2011page TCP / IP network

IPv6 address format

!Global unicast address

!Link local address

88

ULA - May 2011page TCP / IP network

Ver.

Hop LimitPayload length

Flow label

Next Header

Source Address

Destination Address

40 B

yte

s

5 w

ord

s

32 bits

Traffic Class

IPv6 header

89

Address configuration

ULA - May 2011page TCP / IP network

What, how, who?

! What do you need?

• An address => how to get it? (dependent from your point of attachment)

• The router address

• A DNS server

! 3 ways

• Manual (not doable!)

• Automatic

• Dynamic

! IPv4

• DHCP

! IPv6

• Neighbor Discovery - address autoconfiguration

• DHCP

91

ULA - May 2011page TCP / IP network92

Auto-configuration: DHCP

Ethernet / IEEE 802.3

IPARP ICMP

UDPTCP

pingDHCP

Transport

Network

Link

Application traceroute

ULA - May 2011page TCP / IP network

DHCP (Dynamic Host Configuration Protocol)

!Dynamic Host Configuration Protocol- RFC 1541, RFC 2131 / 2132- Derived from an older protocol called BOOTP

• Functions- IP address allocation- IP stack configuration- Various parameters initialisation- Retrieve a boot file

• Works on top of UDP

93

ULA - May 2011page TCP / IP network

IP address allocation

Static Dynamic

Manual Automatic Dynamic

DHCPv4Allocation d’adresse

!Three types of address allocations

• Automatic- An IP address is selected in a pool, and permanently allocated to a client

• Manual- The administrator uses a configuration file- Match an IP address with a MAC address- May be used for security reasons

• Dynamic- The address is selected from a pool- For a given period of time- Method used by ISP

94

ULA - May 2011page TCP / IP network

DHCPv4 How does it work?

95

AR

LAN

95.7.8.0/23

95.7.8.1

AddrIP: 95.7.9.45Netmask: 255.255.254.0GW: 95.7.8.1

WINS addr, …

1. (0.0.0.0, 255.255.255.255, DISCOVER)

3. (0.0.0.0, 255.255.255.255, REQUEST)

!"#$%&'()#%*+,)#-../01!"$%&'2)#%*+,!)#-../01

3"#$%&'()#%*+,)#2451

6"#$%*+,)#%&'()#0/7/28/1

DHCP Client DHCP Server BDHCP Server A

ULA - May 2011page TCP / IP network

DHCPv4Messages

!Requests from clients

• DISCOVER (1)- Ask for an address allocation- List of the requested parameters by the client ( domain name, network mask, DNS, etc)

• REQUEST (3)- Response to the OFFER message or renewing of an allocation- Non-chosen server release the selected parameters by the client

• DECLINE (4)- Indicate that the current address is already in use

• RELEASE (7)- Free an IP address

96

ULA - May 2011page TCP / IP network

DHCPv4Les messages

!Responses from a server• OFFER (2)

- Response to a DISCOVER message- include the first parameters

• ACK (5)- Acknowledgement, include parameters and the IP address allocated to the client

• NAK (6)- Informs the end of an allocation, or bad parameters

!The message type is coded as an option• option 53

97

ULA - May 2011page TCP / IP network

What else?

98

95.7.8.1

95.7.9.45

95.7.8.0/23

95.7.9.40

&'#9:;<+=

IP source: 95.7.9.45IP destination: 95.7.9.40

DataMAC source: 00:26:08:e1:52:c5MAC destination: ??

IP headerEthernet header

ULA - May 2011page TCP / IP network99

Address Resolution Protocol

Ethernet / IEEE 802.3

IPARP ICMP

UDPTCP

pingDHCP

Transport

Network

Link

Application traceroute

ULA - May 2011page TCP / IP network100

ARP

! Address Resolution Protocol (RFC 826)

! Matching between network address (IP) ' MAC address

• Applications only handle IP addresses

• Frames are exchanged using MAC address

! We need to know the destination MAC address to send a frame

! May need to use it for duplication detection! Dynamic cache

• Built and updated by the system

• Each line has a finite lifetime

morrocoy[15:24]% arp -a

default-gw.irisa.fr (131.254.1.1) at 0:4:80:13:69:0

air.irisa.fr (131.254.60.130) at 8:0:20:89:58:95

sky.irisa.fr (131.254.60.147) at 8:0:20:ac:44:3

cuvert1.irisa.fr (131.254.70.14) at 8:0:11:13:99:e5

ULA - May 2011page TCP / IP network101

How does it work?

!If the destination address is not known (not in the table) ! issue an ARP request: Ethernet frame in broadcast

SenderDestination

(IP = (, MAC = $)

ARP request(broadcast)

ARP Response(point to point)

Who has IP = ( ?

That’s me (MAC @ = $)

ULA - May 2011page TCP / IP network102

ARP - duplication detection (1/2)

!When I configure the IP address! ARP request: broadcast an Ethernet frame

sourceDevice with an address collision

(IP = (, MAC = $)

ARP request(broadcast)

ARP response(point to point)

Who has IP = ( ?

That’s me (MAC address = $)

Configuration of the IP address = (

!

The address is already in use

ULA - May 2011page TCP / IP network103

ARP - duplication detection (1/2)

!When I configure the IP address! ARP request: broadcast an Ethernet frame

source

ARP Request(broadcast)

Who as IP = ( ?

Configuration of the IP address = (

Ok. I can use the address

OK

1 s

ULA - May 2011page TCP / IP network

IPv6 address configuration

104

ULA - May 2011page TCP / IP network

Addressing scheme

!RFC 4291 defines current IPv6 addresses

• loopbak (::1)

• link local (fe80::/10)

• global unicast (2000::/3)

• multicast (ff0::/8)

!Use CIDR principles

• Prefix / Prefix length notation

• 2001:660:3003::/48

• 2001:660:3003:2:a00:20ff:fe18:964c/64

! Interfaces have several IPv6 addresses

• At least a link local and a global unicast address

105

ULA - May 2011page TCP / IP network

An IPv6 address - 128 bits

106

!Global unicast address

!Link local address

2001:660:3003:1:34CA:3B73:6543:210F /64

ULA - May 2011page TCP / IP network

Let’s assume you want to configure it yourself

!You need:

• Prefix information

• Interface ID

107

ULA - May 2011page TCP / IP network

Interface ID assignment

! Derived from a L2 ID (i.e., MAC address)! Manually assigned

• To keep the same address when Ethernet card or host is changed

• To easily remember the address- 1,2,3...- Last digit of the v4 address

! Random value

• Change frequently (e.g., every day, per session, at reboot)

• Guaranty anonymity

! Hash or other value• to link the address to other properties

- public key- list of assigned prefixes

• Mainly for security purpose

108

ULA - May 2011page TCP / IP network

Neighbor Discovery

!IETF protocols

• RFC 4861 : Neighbor Discovery for IPv6

• RFC 4862 : IPv6 Stateless Address Configuration

• RFC 4135 : Goals of detecting network attachment

!Mechanisms

• Router discovery

• Prefix discovery

• Address resolution

• Address Auto-configuration

109

ULA - May 2011page TCP / IP network

Router Advertisement

!Periodic message sent by routers!Message sent in response to a Solicitation! ICMP message

Router Advertisement

110

ULA - May 2011page TCP / IP network

Possible Options

111

! Source Link Layer address

• Source link layer address of the router

! MTU

• Indicates the recommended size for the MTU on the link

! Prefix Information

• Indicates the prefix(es) used on the link

ULA - May 2011page TCP / IP network

Router and prefix discovery

! Identification of the prefix(es) used on the link

• Determine the destinations that are on-link

• Prefix used for address auto-configuration

! Identification of the on-link router(s) and default router!The information from a router is only a part of the information of

the link

• The reception of an RA must not erase the previous configuration

112

ULA - May 2011page TCP / IP network

Receipt of a RA

!The node adds the source address of the RA in its router list! If the Source Link Layer is set, the node registers the

association between the link local address and the link layer address in the Neighbor Cache

!Prefix option

• If the OnLink flag is set, it means that the prefix is used on the link

• Add this prefix in the list of prefixes

113

ULA - May 2011page TCP / IP network

Next hop determination

! Usage of a set of tables

• Destination cache

• Prefix list

• Default router list

• Neighbor cache

! Algorithm

• Look for the destination address in the Destination Cache

• Compare the destination IP address with prefixes => longest prefix match

- If the destination is on-link, the destination address remains unchanged

- If the destination is not on-link, the node selects a router as a next hop

• Determination of the link layer address

• Memorization of the choice in the Destination Cache

114

ULA - May 2011page TCP / IP network

Link layer address determination

!Use of the following messages

• Neighbor Advertisement

• Neighbor Solicitation

!Assumption: The node knows the destination IP address, and looks for the link layer address

• Send a Neighbor Solicitation

• Receive a Neighbor Advertisement form the target destination

• Record the correspondence between the IP address and the link layer address in the Neighbor Cache

115

ULA - May 2011page TCP / IP network

Auto-configuration

! Goal: create a valid address for a given link, without human intervention

! Several steps

• Create a link local address by concatenation of the link local prefix and the link layer address of the interface- Prefix : fe80::/64- Link layer address: 00:16:cb:b9:50:bd- EUI-64 identifier of the interface: 02:16:cb:ff:fe:b9:50:bd- Generated address: fe80::216:cbff:feb9:50bd

• Check the address unicity - Duplicate Address Detection- Send a Neighbor Solicitation to the destination address that has been created

• Reception of the prefix sent by the router (Router Advertisement)

• Creation of the global IP address by concatenation of the prefix with the link layer address- Prefix: 2001:660:7301:d170 / 64- Link layer address: 00:16:cb:b9:50:bd- Global address: 2001:660:7301:d170:216:cbff:feb9:50bd

• Optional check of the unicity of the global IP address

116

ULA - May 2011page TCP / IP network

Link layer address - EUI-64 Conversion

117

ULA - May 2011page TCP / IP network

Neighbor Unreachability Detection

!Check wether a device is reachable

• In case of mobility, check that the previous router is still on-link

!Send a Neighbor Solicitation to the target IP address

118

ULA - May 2011page TCP / IP network

Stateless auto-configuration

119

Time T=0: Router is configured with a link-local address and manually configured with a global address ("::/64 is given by the network manager)

FE80::IID1"::IID1/64

FE80::IID2"::IID2/64

t=0

ULA - May 2011page TCP / IP network

Stateless auto-configuration

120

Host attaches to the link and builds its link local address based on the interface MAC address

FE80::IID1"::IID1/64

FE80::IID2"::IID2/64

t=1

t=0 - Router is configured

ULA - May 2011page TCP / IP network

Stateless auto-configuration

121

Host does a DAD (i.e., sends a Neighbor Solicitation to query resolution of its own address: no answer means no other host has this value)

No answer after a timeout => ok

FE80::IID1"::IID1/64

FE80::IID2"::IID2/64

t=2

t=1 - Link local address configuration

t=0 - Router is configured

Neigbor Solicitation (FE80::ID2)

ULA - May 2011page TCP / IP network

Stateless auto-configuration

122

Host sends a Router Solicitation to the All Router Multicast group using the newly link-local configured address

FE80::IID1"::IID1/64

FE80::IID2"::IID2/64

t=0 - Router is configured

t=3

t=2 - DAD on link local address

t=1 - Link local address configuration

Router Solicitation

ULA - May 2011page TCP / IP network

Stateless auto-configuration

123

Router directly answers to the host using Link-local addresses. The answer may contain a / several prefix(es). Router can also mandate hosts to use DHCPv6 to obtain prefixes (state full auto-configuration) and / or other parameters (DNS servers, ...).

FE80::IID1"::IID1/64

FE80::IID2"::IID2/64

t=1 - Link local address configuration

t=0 - Router is configured

t=4

t=3 - DAD on global address

t=2 - Request for a Router Advertisement

Router Advertisement (")

ULA - May 2011page TCP / IP network

Stateless auto-configuration

124

Host performs a DAD (i.e., sends a Neighbor Solicitation to query resolution of its own global address: no answer means no other host has this value)

FE80::IID1"::IID1/64

FE80::IID2"::IID2/64

t=2 - DAD on link local address

t=0 - Router is configured

t=1 - Link local address configuration

t=5

t=4 - Receives a Router Advertisement

t=3 - Request for a Router Advertisement

No answer after a timeout => ok

Neigbor Solicitation ("::ID2)

ULA - May 2011page TCP / IP network

Stateless auto-configuration

125

Host sets the global address and configures the answering router as the default router

FE80::IID1"::IID1/64

FE80::IID2"::IID2/64

t=3 - Request for a Router Advertisement

t=1 - Link local address configuration

t=0 - router is configured

t=2 - DAD on link local address

t=6

t=5 - DAD on global address

t=4 - Receives a Router Advertisement

ULA - May 2011page TCP / IP network

In summary... Neighbor Discovery

!Determine link layer address of their neighbors!Address auto-configuration (statefull and stateless)

• Layer 3 parameters (IPv6 address, default route, MTU and Hop limits)

!Duplication Address Detection (DAD)!Maintain neighbor reachability

126

The transportlayer

ULA - May 2011page TCP / IP network128

Let’s start with User Datagram Protocol

Ethernet / IEEE 802.3

IPARP ICMP

UDPTCP

pingDHCP

Transport

Réseau

Liaison

Application traceroute

ULA - May 2011page TCP / IP network129

UDP

!User Datagram Protocol (RFC 768)!Applications/protocols using UDP

• NFS (Network File System)

• DNS (Domain Name System)

• DHCP (Dynamic Host Configuration Protocol)

• Multimedia applications (interactive voice and video, streaming)

ULA - May 2011page TCP / IP network130

What is proposed by UDP

!Minimal support at the transport level

• No retransmission of lost packets, no flow control, no congestion control

!A datagram service

• No connection

• Non-reliable- Reliability may be implemented at the application

ULA - May 2011page TCP / IP network131

UDP datagrams and IP packets

!Information unit (PDU) = datagram!No segmentation

• When an application generate a data, it is encapsulated in a UDP datagram

• Potentiel risk of fragmentation from the IP layer

DataUDP headerIP header

IP Packet

UDP datagram

ULA - May 2011page TCP / IP network132

UDP header

3116150

32 bits

Source port number Destination port number

Length Checksum

Data (if any)Data (if any)

8 b

ytes

!Port number: demultiplexing

ULA - May 2011page TCP / IP network133

Checksum

!UDP Checksum: optionnal (mandatory in TCP)• if not present: checksum = 0

!Method (same as TCP)• Sum of 16 bits words and then sum each element with 1 (ci + 1, i=0..15)

• Header + data + IP « pseudo header »

3116150

Source IP addressSource IP addressSource IP address

Destination IP addressDestination IP addressDestination IP address

0 Protocol (= 17) Length

Source port numberSource port number Destination port number

LengthLength Checksum

DataDataData

0

Pseudo

header

UDP

header

Padding (only for the calculus)

ULA - May 2011page TCP / IP network134

Plan : Transport Control Protocol

Ethernet / IEEE 802.3

IPARP ICMP

UDPTCP

pingDHCP

Transport

Network

Link

Application traceroute

ULA - May 2011page TCP / IP network135

Outline

!TCP characteristics!TCP Connection!Flow control!Congestion control

ULA - May 2011page TCP / IP network136

Introduction to TCP

!Examples of applications / protocols using TCP• HTTP (web), SMTP / POP / IMAP (e-mail), FTP, telnet, ssh...

• Youtube !

!Proportion de trafic TCP dans l’Internet (McCreary et Claffy, 2000)• 91% of bytes

• 83% of packets

!Documents from the IETF• RFC 793 (Base specification)

• RFC 1122 (Requirements for Internet Hosts)

• RFC 2581 (Congestion Control)

• RFC 2988 (Retransmission Timer)

• RFC 1323 (Extensions for High Performance)

• RFC 2018, 2883 (Selective Acknowledgment)

• …

!Standard de facto: BSD implementation• Several thousands of line of codes

ULA - May 2011page TCP / IP network137

What is TCP providing to the applications?

!A transparent, without errors, bidirectional channel that transport a sequence of bytes

• End-to-end protocol

• Reliable

• Byte-stream

• Connection-oriented protocol

ULA - May 2011page TCP / IP network

138

End-to-end

IP

TCP

web server

IP

TCP

web client

PPP

IP

routeur (nœud)

End-host End-host

TCP (end-to-end)

http (application layer)

IP protocol IP protocol

Ethernet

Protocol

PPP

protocolEthernet Ether. PPP

serial linkLAN Ethernet

ULA - May 2011page TCP / IP network139

Reliable service

!TCP makes the assumption that the network is not reliable• Acknowledgement of the received data

• Retransmission of the lost data

• Flow control

• Re-sequence data that are coming in the wrong order

• Discard of the duplicated data

• Check data integrity (checksum)

ULA - May 2011page TCP / IP network140

Byte stream

!TCP transports bytes

• Not structured flow: TCP may arbitrarily segment data- TCP does not provide packet logic to upper layer, it’s up to the application

• For each transmitted byte, a sequence number is used- Allows detecting loss and giving numbers to Ack

ULA - May 2011page TCP / IP network141

TCP segments and IP datagrams

!Information unit (PDU) = segment

• Application data are split into blocks, transported as TCP segments

• Each TCP segment is encapsulated in an IP datagram

DataTCP headerIP header

IP datagram

TCP segment

ULA - May 2011page TCP / IP network142

TCP header7 8 9 153 4 3110 160

Res. Flags

Data (if any)

TCP options (if any)

Pointer to urgent dataChecksum

Size of the windowLength

Acknowledgment number (ACK)

Sequence number

Destination port numberSource port number

20bytes

ECN

ULA - May 2011page TCP / IP network143

Port numbers

!Well known• from 0 to 1023

• Used for usual services- Examples : telnet serveur (23), ssh (22), http (80)...

!reserved• From 1024 to 49151

- Example : Quake (26000), SIP (5060)

!« Temporary »• from 49152 to 65535

• Dynamically allocated by the application- Typically, a client, such as http client or ssh client

ULA - May 2011page TCP / IP network144

Flags from the TCP header

The field pointer to urgent data is validURG

The TCP receiver must quickly send data to the upper layers

PSH

Reset the connectionRST

The field Ack Number indicates a correct valueACK

The TCP sender does not have more data to sendFIN

Synchronized the sequence number at both ends of the connection

SYN

!Used for signaling

ULA - May 2011page TCP / IP network

Sequence number and acknowledgement

!Sequence number = first byte of data in the segment

!Ack number = next byte that the sender is ready to receive

!Connexion full-duplex

• A sequence number for each direction

A BSequence number = XN bytes of data

[X, X+N–1]

ACKnumber = X+N

Sequence number = X+NM bytes of data

[X+N, X+N+M –1]

ULA - May 2011page TCP / IP network146

Ack mechanism

!Positive acknowledgement• The received confirms what it receives in sequence

- Does not spontaneously notify that data is missing- Does not explicitly indicate what is missing

!« Accumulative » property• An ACK may acknowledge more than one received segment

• Delayed ACKs mechanism : send e.g. 1 ACK every other segment

[X, X+N-1]

[X+N, X+N+M-1]

ACK X ACKX+N+M

[Y, Y+J-1]

[Y+J, Y+J+K-1]

ACKY+J+K

Positive confirmation accumulation

ULA - May 2011page TCP / IP network147

Retransmission and acknowledgement of TCP segments

writing : N bytes

out of sequence data (buffered)

M bytes

seq = X

ACK X

seq = X+N

seq = X+N+M

ACK X

K bytes

retransmission seq = X

ACK X+N+M+K+1

out of sequence data (buffered)

N+M+K bytes sent to the application (in the right order)

RTO

!Reliable service = retransmission if loss

ULA - May 2011page TCP / IP network148

Outline

!TCP characteristics!Connection

• Opening and closing

!Flow control!Congestion control

ULA - May 2011page TCP / IP network149

Connection set up and tear down

!Before sending data, a connection establishment is needed

• Signaling : three-way handshake

!Typical phases of a TCP connection

• Establishment

• Data exchange

• Closing

ULA - May 2011page TCP / IP network150

Opening the connection

client serverapplication : passive openingapplication:

active opening SYN j

SYN k, ACK j+1

ACK k+1

connection established

connection established

Three segments to open the connection: three-way handshake

ULA - May 2011page TCP / IP network151

Closing the connection

client server

application : active closing

FIN m

ACK m+1

ACK n+1

notify the closing to the application

FIN napplication : passive closing

notify the closing to the

application

Four segments to close the connection

ULA - May 2011page TCP / IP network152

Outline

!TCP characteristics!Connection!Flow control

• Sliding window

!Congestion control•

ULA - May 2011page TCP / IP network153

Flow control

!End-to-end!Objectives

• Avoid having the sender to send data too fast for the TCP receiver to receive

• Better exploit the network capacity

!How this is done?

• TCP controls the frequency at which segments are sent

ULA - May 2011page TCP / IP network154

Flow control in TCP!« Sliding window »

• Idea :- The sender can send data without having received an ack for what has been already sent

(efficient usage of the capacity) ...- ... as long as the sender is able to receive new segments (flow control)

• Principle: each TCP indicates the number of bytes it is ready to receive, from the ack number

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18Byte n°

Window indicated by the receiver

Sent and acknowledged

Sent but not acknowledged

May be sent without delay

Can not be sent yet

useful windowSender :

ULA - May 2011page TCP / IP network155

Sliding window: evolve in time

!The sender confirms the correction reception ! the window is closing• Header: Ack number (32 bits)

!The receiver reads data (acknowledged) ! the window is opening• Header: Window size (16 bits)

!Closing + opening = the window moves forward (slides)

sending window

closing opening

ULA - May 2011page TCP / IP network

BASYN 0

SYN 0, ACK 1, MSS = 1024, window = 40961

2ACK 1

3

[1, 1024]4

[1025, 2048]5

[2049, 3072]6

ACK 2050, window = 40967

ACK 3073, window = 30728

[3073, 4096]9

ACK 4097, window = 409610

[4097, 5120]11

[5121, 6144]12

[6145, 7168]13

ACK 6145, window = 409614

[7169, 8192]15

initial window

slide

closing

opening and slide

Opening of the connection

Data transfert

Sliding window: an example

156

ULA - May 2011page TCP / IP network157

Flow control with sliding window

!Intuitively :

• The more the bandwidth is, and/or

• The more the RTT is,

• Then : the largest the window must be to allow the sender to continuously send data

ULA - May 2011page TCP / IP network158

Outline

!TCP characteristics!Connection!Flow control!Congestion control

• Algorithms : Slow start, Congestion avoidance, Fast retransmission, Fast recovery

ULA - May 2011page TCP / IP network159

Congestion control! Packets are in the network

• On the link

• In the routers queues

! Example : intermediate link at low rate ; window = 20• Distance between ACKs : given by the slowest link

• if R1queue is full ! packets loss

Ro

ute

r R

1

sen

der

sen

der

bottleneck

Ro

ute

r R

2

rece

iver

rece

iver

data

ACKs

1 2

3 4 5 6

7 8

10 9

14 13 12 11

20 ... 15

[Stevens, 1994]

ULA - May 2011page TCP / IP network160

About congestion...

!The sliding window garanties that the receiver is not flooded

!Problem: what if the bottleneck is in the network, and not in the receiver??• Possible causes

- slow link-# (incoming flows) > capacity of the link

!(In principle…) We can not know the state of the intermediate nodes• Changing state

ULA - May 2011page TCP / IP network161

Congestion control in TCP

!Set up by the sender• Mechanisms

- Congestion detection- Reaction against detection

!(up to) Four algorithms work together• Slow start

• Congestion avoidance

• Fast retransmit

• Fast recovery

!Sending window: wnd = min( rwnd, cwnd )• rwnd := receiver window size

- Flow control by the receiver

• cwnd := Congestion window- Flow control by the sender

ULA - May 2011page TCP / IP network162

Congestion control algorithms

!Slow start• Start with precautions

- cwnd = 1 or 2 segments– At the beginning of the connection (we don’t know the status of the network!)

– When the timeout expires

• Use the Ack receiving rate to adapt the sending rate (self-clocking)- Network with low charge

– Quick response from the network- Loaded network

– ACKs take more time to arrive ! slow start

!Congestion avoidance• Once we reached the congestion point, (try to) avoid to reach it again

• But, still trying to efficiently use the bandwidth

ULA - May 2011page TCP / IP network163

Slow start

!Algorithm• With each new received ACK: cwnd ) cwnd + 1

!Without losses, cwnd increases (quasi)-exponentially• Growth rate: (more or less) % 2 per RTT

1

cwnd (in TCP segments)

2 3 4 5 6 7 8 9 ... 16

ULA - May 2011page TCP / IP network164

cwnd evolution: school case

cwnd

time (RTTs)

1

Network congestion

ssthresh

RTO intervalslow

start

fast recovery +

congestion

avoidance

fast recovery +

congestion

avoidance

slow start +

cong. avoid.

ULA - May 2011page TCP / IP network165

cwnd evolution: more realistic version

cwnd

time (RTTs)

0

Network congestion

RTO

ssthresh

ssthresh