Layering and TCP/IP Stack -...
-
Upload
duongthuan -
Category
Documents
-
view
253 -
download
1
Transcript of Layering and TCP/IP Stack -...
Layering and TCP/IP Stack
Nicolas [email protected]
Universidad de los Andes
Merida, Venezuela
May 2011
ULA - May 2011page TCP / IP network
About the slides
!These slides are part of the module on Voice over IP at Universidad de Los Andes - Venezuela
!Many thanks to (alphabetic order)• Annie Gravey (Telecom Bretagne)
• David Ros (Telecom Bretagne)
• Emil Ivov (jitsi)
• G6 - the french IPv6 task force
• German Castignani (Telecom Bretagne)
• Gilbert Martineau (Telecom Bretagne
• Kurose and Ross - Computer Network, a top down approach
• Laurent Toutain (Telecom Bretagne)
• Xavier Lagrange (Telecom Bretagne)
2
ULA - May 2011page TCP / IP network
Outline
! Introduction by an example: an HTTP session! Why layering?
• Layers and modelling
• Encapsulation principle
! Standardization• Overview of the IETF - the engineering Task Force
! The IP layer• Header
• Discussion on address allocation
! Address configuration• IPv4: DHCP, ARP
• IPv6: Neighbor Discovery
! Transport level• UDP
• TCP
3
ULA - May 2011page TCP / IP network
What’s a protocol?
a human protocol and a computer network protocol:
4
Hi
Hi
Got the
time?2:00
TCP connection
request
TCP connection
responseGet http://www.awl.com/kurose-ross
<file>
time
ULA - May 2011page TCP / IP network
An example: http
5
SignalingEstablish a TCP connection
ULA - May 2011page TCP / IP network
An example: http
6
Can I gethttp://www.rfc-editor.org/rfc/rfc3261.txt?
ULA - May 2011page TCP / IP network
http request
7
http get http://www.rfc-editor.org/rfc/rfc2616.txt
ULA - May 2011page TCP / IP network
http request
8
http get http://www.rfc-editor.org/rfc/rfc2616.txt
ULA - May 2011page TCP / IP network
http 200 OK
9
Here it is
ULA - May 2011page TCP / IP network
http Reply
10
http 200 OK
ULA - May 2011page TCP / IP network
Protocol “layers”
!Networks are complex - many pieces
• Hosts
• Routers
• Links of various media
• Applications
• Protocols
• Hardware, software
How can we organize the structure of a network?
13
ULA - May 2011page TCP / IP network
Why layering?
!Dealing with complex systems
• Explicit structure allows identification, relationship of complex system’s pieces- Layered reference model for discussion
• Modularization eases maintenance, updating system- Change of implementation of layer’s service transparent to the rest of the
system- e.g., change in gate prcedire does not affect rest of the system
!But... keep in mind that
• It is difficult to draw a line between layers, protocols
• Sometime we still need more information from a layer
• Sometime we will duplicate operations
14
ULA - May 2011page TCP / IP network
Internet protocol stack
!Application: supporting network applications
• FTP, SMTP, HTTP
!Transport: process-process data transfer
• TCP, UDP
!Network: routing of datagrams from source to destination
!Link: data transfer between neighboring network elements
• ppp, Ethernet
!Physical: bits “on the wire”
15
Application
Transport
Network
Link
Physical
ULA - May 2011page TCP / IP network16
Protocols stack for the Internet (subset)
IEEE 802.3 (+ LLC + SNAP)Ethernet
PPP
IPARP ICMP
UDP TCP
! The border between layers is not easy to draw! Some protocols are not easy to place (ex. ICMP, ARP)
application applicationapplication application
Transport
Network
Link
Application
Physical
ULA - May 2011page TCP / IP network17
Application layer protocols
Routing protocols
IEEE 802.3 (+ LLC + SNAP)Ethernet
PPP
IPARP ICMP
UDP TCP
OSPF pingRIP http
Transport
Network
Link
Application
Physical
ULA - May 2011page TCP / IP network18
IP network: encapsulation
Application
TCP
IP
EthernetDriver
PhysicalExample: TCP/IP in a local Ethernet network
Data chunkTCP headerIP header
Data chunkTCP header
Data chunkTCP headerIP headerEthernet
header
CRC
Ethernet
Data chunkApplication
data
TCP segment
IP packet
Ethernet frame
ULA - May 2011page TCP / IP network19
IP Network: demultiplexing
Ethernet
IP ARP
ICMP
UDP
RIP
TCP
http
! Question: How to move up the data in the protocol stack?
! Answer: headers include the information
Ethernet header: Type
IP header:Protocol field
TCP or UDP header: source and destination port number fields
617
1
0x800 0x806
80520
Example: TCP/ IP in a local Ethernet network
ULA - May 2011page TCP / IP network20
IP network: heterogeneous networks interconnexion
IP
TCP
Webserver
IP
TCP
Webclient
PPP
IP
Router (node)
End host End host
TCP protocol (end-to-end)
http (application layer)
IP Protocol IP Protocol
Ethernet
Protocol PPPEthernet Ether. PPP
serial linkLAN - Ethernet
ULA - May 2011page TCP / IP network21
The Internet Protocol
!Two incompatible versions• IPv4: most used version today
• IPv6: built on the IPv4 experience
!A datagram service• Route the packets from a source to a destination
• Each packet includes the complete address of the destination
• Non-connection oriented service
- Each packet is independently handled
– Two consecutive packets may use different
routes
ULA - May 2011page TCP / IP network22
What does IP offer to upper layers?
!A datagram service
• Non reliable- Packet loss- Duplication- Sequence may not be respected
• Best-effort service
ULA - May 2011page TCP / IP network23
IP networks interconnexion
A
B
AS 1AS 2
!Addresses have a global signification (IPv4)• There should not be duplicated addresses (except NAT of course)
! IP packet forwarding• Routers use the addresses within the packets
• Routing table: what is the next hop - next router to reach the destination
Standardization... The IETF
ULA - May 2011page TCP / IP network
Few standardization organisms
! ITU-T
• International standardization body - T stands for Telecommunication
• Give the rough guidelines
!ETSI - ANSI
• Regional standardization bodies, adapting the standards form ITU-T to the specific region
• May now by leader in proposal (e.g., GSM)
! IEEE
• Specify the physical and link layers for various technologies - 802.3, Bluetooth, Wifi, Wimax
! IETF
• The IP world- Network, transport and application layers
25
ULA - May 2011page TCP / IP network26
IP networks and Internet: standardization
IESG (Internet Engineering Steering Group)
IETF (Internet Engineering Task Force)
ISOC (Internet Society)
IAB (Internet Architecture Board)
IRSG (Internet Research Steering Group)
IRTF (Internet Research Task Force)
Area
WG
WG
WG
...RG
Area
WG
WG
WG
Research group
RGetc. ...
IANA (Internet Assigned Numbers Authority)
ULA - May 2011page TCP / IP network27
The IETF Internet Engineering Task Force
!Organized into areas
• Ex. : Transport - Routing
!Each area is organized into Working Groups
• Ex. : sip - mext - roll
!Principle: « rough consensus and running code »
!Opened to anyone• 3 f2f meeting per year...
• ... but everything done on mailing lists
ULA - May 2011page TCP / IP network28
!Free access to documents
• http://www.ietf.org/
!Documents
• Standards : Request For Comments- Ex. : RFC 793 (TCP)- Permanent
• Working documents: Internet Draft- Ex. : draft-ietf-tsvwg-tcp-eifel-alg-07.txt (Eifel algorithm for TCP)- Last only 6 months
The IETF Internet Engineering Task Force
ULA - May 2011page TCP / IP network29
!Standardization process• Standards track :
- Submission of a personal draft:draft-untel-mon-sujet-favori-00.txt
–E-mail at : [email protected]
- Discussion within mailing lists and at the IETF meetings–Try to make the draft become a working group item:draft-ietf-xxxwg-my-subjet-00.txt
–Reach a consensus on the Mailing list (known as Last Call)–Give the document to an Area Director–Last call in all groups–If accepted: send it to RFC Editor (and IANA, if values need
to be allocated)- RFC : proposed standard, then draft standard et finally standard
The IETF Internet Engineering Task Force
ULA - May 2011page TCP / IP network30
RFCs
!Different classes of RFC :• Documents coming from the standardization process (proposed
standard, draft standard, standard)
• Other- Experimental- BCP (Best Current Practice)- Informational- ...
• Be careful to the date!- RFC 1149 (April 1st 1990) : A Standard for the Transmission of IP Datagrams on
Avian Carriers- RFC 2549 (April 1st 1999) : IP over Avian Carriers with Quality of Service- RFC 3514 (April 1st 2003) : the evil bit in the IP header
ULA - May 2011page TCP / IP network31
Standards evolution
!In general, one protocol ! one RFC
• Example : TCP- RFC 793 (first specification)- RFC 1122 (Requirements for Internet Hosts)- RFC 1323 (Extensions for High Performance)- RFC 2018, 2883 (Selective Acknowledgment)- RFC 2581 (Congestion Control)- RFC 2988 (Retransmission Timer)- etc. etc. ...
ULA - May 2011page TCP / IP network32
Research activities: IRTF
!http://www.irtf.org/
!Foresee at a much longer term!Sometimes groups are private
• Decided by the chair
• Mailing lists are public
!Some examples• End-to-end (closed today)
• Anti-spam
• Virtual networks
• DTN (Delay/Disruption Tolerant Network)
!Nowadays, research is made through scientific publication
Ethernet and 802.3
ULA - May 2011page TCP / IP network34
Expand the link layer
Application
Transport
Network
Link
Physical
LLC (Logical Link Control)
Error control, flow control
MAC (Medium Access Control)
Share the link, addressing
{
ULA - May 2011page TCP / IP network35
Local Area Network
!Media shared by several devices• Operation mode: broadcast and define how to access
- No centralized operation
!If the media is share, we need• To be able to identify a device ! addresses
• Rules to access the medium
!Some properties• No configuration required to operate
• Need not be scalable
• Addresses still need to be unique
ULA - May 2011page TCP / IP network36
Architecture
ULA - May 2011page TCP / IP network37
IEEE 802.3 and Ethernet
!Ethernet vs. IEEE 802.3 :• Same physical layer
• Same medium access method
• MAC frame:- Same format...- ...but different usage of one field
! Two non-compatible protocols
ULA - May 2011page TCP / IP network38
IEEE addressing
!Made of 2 parts
• Manufacturer part (named OUI code), bought to IEEE by a manufacturer, guarantee the uniqueness
• Identification part (serial number)- For a given manufacturer, must be unique
!The MAC address format is the same whatever the protocol (Ethernet, Wifi, ...)
• Ease network interconnection
ULA - May 2011page TCP / IP network39
Format des adresses MAC (norme IEEE 802.1)
! Unique address (worldwide): more or less 1014 different address
! Access all OUI code: http://standards.ieee.org/regauth/oui/index.shtml
I/G U/L 46 bits
0 = I : individual address1 = G : group address
0 = U : universal address1 = L : local address
0 U/L 46 bits
3 bytes: manufacturer code (OUI) 3 bytes: serial number
ULA - May 2011page TCP / IP network40
MAC address format (802.1)
OUI code on 3 bytes, hexadécimal Manufacturer
00-00-0C Cisco
00-03-93 Apple
02-80-8C 3Com
08-00-20 Sun
08-00-5A IBM
OUI codeOUI code
!Three addresses family, three operating modes• Point-to-point (unicast) : one device
• Broadcast: designate all equipments on the network (FF-FF-FF-FF-FF-FF)
• Restricted broadcast (multicast) : designate a subset of all equipments (first bit of address equal to 1)
ULA - May 2011page TCP / IP network41
Trame de D à B
Unicast
!D send the message on the media in broadcast!All stations network interfaces receive the message!Only the interface having the destination MAC address configured
will forward the message up to the stack• Layer 2 filtering: done by the card
ULA - May 2011page TCP / IP network42
Access mode: CSMA/CD
!Carrier Sense Multiple Access / Collision Detect• CSMA : before transmitting, the sender probes the channel to
detect current transmission
• CD : the sender checks wether someone else is also sending at the same time (= collision)
!Collision = bad reception• Frames need to be transmitted again
ULA - May 2011page TCP / IP network43
CSMA/CD : simplified algorithm
1. If the channel is free, then send the frame2. If the channel is busy, wait for it to be idle, and
then send when it’s idle3. If a collision occurs
a.Stop the transmission
b.Wait a random time and go back 1
ULA - May 2011page TCP / IP network44
Transmission on an idle media
A probes the channel:channel free ! sending
ULA - May 2011page TCP / IP network45
Sending on a busy channel
B probes the channel:channel busy ! wait ...
... until the channel is idle
A probes the channel:channel free ! sending
ULA - May 2011page TCP / IP network46
A probes the channel: it’s idle, let’s send
propagation delay
B probes the channel : detects a signal, wait
station A
station B Time
collision
B detects that the channel is idle ! send the frame
!Collision because channel propagation...
A detects a collision ! stops transmitting
ULA - May 2011page TCP / IP network47
Frame size
!We need to have upper bound and lower bound frame size
!Maximum size
• Goal: avoid a station to use the channel too long
• Fixed to 1518 bytes
!Minimum size
• Goal: help in detecting collision (see next slide)
• Fixed to 64 bytes
ULA - May 2011page TCP / IP network48
If frames were too small
A
B
tim
e
collision
max. propagation delay = "
sending time < 2 "
= 2 "
C
D
A et B : stations that are far away
In this example, A et B successfully
transmitted their messages, but:
• A and B do not detect the collision
• C receives correctly the frame from
A, but not from B
• D receives correctly the frame from
B, but not from A
By adding padding (additional bits at
the end of the frame), in order to make
the sending time at least two times the
propagation delay, we avoid this
problem
ULA - May 2011page TCP / IP network49
Si la taille minimale des trames est bornée...
tem
ps
collision détectée par A2 "
Exemple avec durée minimale de trame = 2 " + #
A
B
C
D
Envoi de données de brouillage (renforçant la collision)pendant un temps $ < #
" 2 " + $
Dans cet exemple, la durée minimale d’émission est > 2 fois le délai de propagation
! A voit le canal libre et commence à émettre
! B voit le canal libre et commence à émettre
! B se rend compte presque immédiatement de la collision
! B poursuit la transmission pendant quelques instants, afin que la collision soit bien décelable par les autres équipements
ULA - May 2011page TCP / IP network
Frame format
50
ULA - May 2011page TCP / IP network51
Encapsulation at the physical layer
7 bytes 1 byte from 64 to 1518 bytes
Preambule Starting of a frame MAC data
7 % (101010102) = 101010112
next frame
Preambule
Minimum inter-frames silence (IFS)
!Preambule: allow the receiver to get synchronised (101010102 = squared signal in Manchester coding)
! Inter-frame silence: allows to separate two successive frames• 802.3 / Ethernet at 10 Mbit/s : IFS = 9,6 !s
ULA - May 2011page TCP / IP network52
Frame format
6 bytes 6 bytes 2 bytes & 0 byte & 0 bytes 4 bytes
Destination
address
Source
addresshigher layer
protocol numberupper layer data Padding CRC
6 bytes 6 bytes 2 bytes & 0 bytes & 0 bytes 4 bytes
Destination
address
Source
addressSize LLC data LLC data Padding CRC
Ethernet
IEEE 802.3
from 46 to 1500 bytes
used for the CRC
ULA - May 2011page TCP / IP network53
How to distinguish 802.3 from Ethernet?
Ethernet
IEEE 802.3
de 46 à 1500 octets
Value coded in the field protocol type / length
if " 1500 = 0x5DC, then 802.3
if > 1500 = 0x5DC, then Ethernet (protocols codes are always > 1500)
6 bytes 6 bytes 2 bytes & 0 byte & 0 bytes 4 bytes
Destination
address
Source
addresshigher layer
protocol numberupper layer data Padding CRC
6 bytes 6 bytes 2 bytes & 0 bytes & 0 bytes 4 bytes
Destination
address
Source
addressSize LLC data LLC data Padding CRC
ULA - May 2011page TCP / IP network54
Ethernet and the Network layer
Ethernet
!At the MAC layer, we don’t know!• We send to the upper layer, the padding bytes ! and what about the layer
architecture!!!
• PDU at the upper layer MUST have a Length field
Is this only data, or data and padding?
6 bytes 6 bytes 2 bytes & 0 byte & 0 bytes 4 bytes
Destination
address
Source
addresshigher layer
protocol numberupper layer data Padding CRC
ULA - May 2011page TCP / IP network55
Missing...
!More details on CSMA/CD and backoff algorithm!LLC - Logical Link Control!Type of cable and bandwidth!Switch!Bridge - spanning tree algorithm!Virtual LAN (VLAN)
The IP layer
ULA - May 2011page TCP / IP network
57
IPv4 header format
1873
Data (from the upper layer)
Options (if any)
IP destination address
IP source address
header ChecksumProtocolTime to Live (TTL)
FragmentFlagsIdentification
Total length (in bytes)Type of serviceheader length
Version= 4
31191615840
20bytes
32 bits
Padding
ULA - May 2011page TCP / IP network58
Length fields
!header length en-tête (4 bits) : in 32 bits words• Maximal size (with options) = 60 bytes ! max. 40 bytes for options
!Total length (16 bits)• Theoretical maximum = 216 – 1
• IP over Ethernet : needed to distinguish the information from padding
& 0 bytes & 0 bytes
Destination
address
Source
address
protocol = IP
(0x800)IP packet padding CRC
sent to the upper layer
Ethernet frame
ULA - May 2011page TCP / IP network59
Type of service (TOS)
!Current definition (RFC 2474, 3168) : differentiation of service (DiffServ) + congestion notification (ECN)
• DSCP field :- 6 values for Class Selector (compatible with the previous Priority field),
- 12 values for Assured Forwarding
- 1 value for Expedited Forwarding
0 1 2 3 4 5 6 7
DiffServ Code Point (DSCP)DiffServ Code Point (DSCP)DiffServ Code Point (DSCP)DiffServ Code Point (DSCP)DiffServ Code Point (DSCP)DiffServ Code Point (DSCP) ECNECN
ULA - May 2011page TCP / IP network60
Fragmentation
!If the link MTU does not allow transporting the packet• Send the packet in fragments
!Reassembling is performed by the destination!Costly for routers
Dessin : [Tanenbaum, 2002]
ULA - May 2011page TCP / IP network61
Fragmentation: header fields
! Identification: unique number (for the sender)
• If the packet is fragmented again, all fragments will still have this number
! Position of the fragment: position of 1st byte of the fragment in the original datagram
• Cut of multiple of 8 bytes
! DF (don’t fragment) = 1: the packet must not be fragmented
• If a fragmentation is needed: the packet is discarded and an ICMP message is returned to
the source
! MF (more fragments)
• MF = 0 Last fragment
! Flags by default (for a non-fragmented packet) : DF = MF = 0
16 bits 1 bit 1 bit 1 bit 13 bits
Identification 0 DF MF Position of the fragment
ULA - May 2011page TCP / IP network62
Fragmentation: an example
IDDF
MF place
123456 0 0 0 0
E R1 R2 Dmax. data = 4096 max. data = 1024 max. data = 512
123456 0 0 1 0
123456 0 0 0 128
(2021 bytes of data)
(1024 bytes)
(997 bytes)
123456 0 0 1 0
(512 bytes)
123456 0 0 1 64
(512 bytes)
123456 0 0 1 128
(512 bytes)
123456 0 0 0 192
(485 bytes)Identification 0 DF MF Position
ULA - May 2011page TCP / IP network63
Time To Live field (TTL)
!Initialised with a value > 0• Typical value = 64
!-1:• Each time a packet crosses a router
• Once per second, if the packet is waiting reassembling at the destination
A B
R1
R2
R3
Routing loop (when routing tables are
erroneous)
ULA - May 2011page TCP / IP network64
Protocole field
Ethernet / SNAP
IPARP ICMP
UDPTCP
pingDHCP
Transport
Network
Link
Application
type = 0x800type = 0x806
protocol = 1
protocol = 6 protocol = 17
traceroute
ULA - May 2011page TCP / IP network
IP addresses
!Why do we need addresses?
• Is it identification then?
• Location only!
!How many addresses per node? Who get an address??!Possible analogy
• Fixed telephone number? Almost...
65
ULA - May 2011page TCP / IP network66
IP Addresses
!32 bits• Human representation: 4 blocks of 1 byte, separated by a dot
• Decimal representation
! Includes a subnet (network) address and an interface address
• Delimited by the netmask: 131.254.100.48/24
10000011 11111110 01100100 00110000
131 254 100 48. . .
netmask
Prefix Identifier
11111...11111 000...0000
32 bits
variable
ULA - May 2011page TCP / IP network67
IP Adresses: subnets
Router
@ Net1
PC1
PC2
PC3
@ Net2
PC4 PC5
10.1.1.101/24
10.1.1.102/24
10.1.1.103/24
10.1.1.1/24
10.1.2.1/24
10.1.2.101/24 10.1.2.102/24
ULA - May 2011page TCP / IP network68
IP Adresses and Netmask: example
Which part belongs to the network, and which one belongs to the interface
Dessin : [Toutain, 2003]
Network Interface
IP address
ULA - May 2011page TCP / IP network69
Particular IP addresses
!Loopback: no packet are sent on the network!Broadcast: Reach all nodes on the local network
• Matching between IP diffusion and MAC diffusion
From: [Tanenbaum, 2002]
This host
A host in this network
Broadcast on the local network
Broadcast on a distant network
Loopback
Network
Any value
ULA - May 2011page TCP / IP network
What do we use?
70
What do we want to use? What the network is actually using?
Names Addresses
Resolution name system: Domain Name System
Where is ietf.org?
It is at 56.54.29.4
ULA - May 2011page TCP / IP network71
IP addresses: allocation and thoughts
! IP address: unique identification of a global scope
• Addressing: need to be scalable
!32 bits: (in theory) gives 232 # 4 billions different addresses
• Fixed size to simplify routing decision- Address management per packet (datagram!)
!How IP addresses can be allocated??
• If random choice- what about duplication? (address unicity)
- Routing?
• Fixed allocation to nodes- What about nomadism?
• Need a performant allocation system- Finite address space: no waste!
ULA - May 2011page TCP / IP network
IP addresses allocation
!Before 1994
• Classfull addressing
!After 1994
• Classless addressing called CIDR (Classless Interdomain Routing)
!Why do we need always more??*
• Mobile devices
• “Always on connections” - compared to dial up modems
• Internet demographics
• Inefficient address space use
• Virtualization
72
*Source: Wikipedia - http://en.wikipedia.org/wiki/IPv4_address_exhaustion
ULA - May 2011page TCP / IP network73
IP addresses classes (before 1994)
Dessin : [Tanenbaum, 2002]
!126 réseaux classe A de # 16 % 106 machines chacun!Environ 16 % 103 réseaux classe B de # 65 % 103 machines
chacun!Environ 2 % 106 réseaux classe C de # 250 machines chacun
ULA - May 2011page TCP / IP network74
IP addressing (before 1994)
!Flat addressing space• No hierarchical numbering
• Management through a central entity - Network Information Center
• No relation between an address and the geographic position: simplify the administration- 128.92 / 16 = IntelliCorp (US)- 128.93 / 16 = INRIA (France)
!However...
• Very inefficient use of the address space- Choice between a class C (/24 - 254 possible hosts) and class B (/16 - 65 000 possible hosts)
- Depletion of the class B address space...
!Évolution• CIDR - classless addresses
• Adresses privées + NAT
• IPv6 (addresses on 128 bits)
ULA - May 2011page TCP / IP network75
Network Address Translation
! Basics (RFC 3022)
• Share IP address(es) between several users
• The number of IP address is usually smaller than the number of clients- Example: a company network
! Network Address and Port Translation
• Usage of the port number to differentiate users
• Inside the LAN, each device has a (private) different IP address– Typically 10.0.0.0 / 8
• From the outside, only a small number of IP addresses are used– Public (routable) addresses
! Interface between the LAN & Internet = NAT box
• Dynamic conversion of the addresses for incoming and outgoing packets
• Typical implementation (home networking): NAT + firewall (+ router (+ Wifi AP))
ULA - May 2011page TCP / IP network76
NAT: an illustration
Privée source addressPublic source address
Source: [Tanenbaum, 2002]
Privatesource @
Privatesource port
Publicsource port
10.0.1.2 63378 51031
ULA - May 2011page TCP / IP network77
Addresses management - Post-1994
! 1st idea: strict rules for address allocation (RFC 1466)
• Restrict the usage of the class B with specific rules
• Allocate contiguous blocks of class C addresses- So contiguous blocks of class C addresses where allocated to RIR, which
can then allocate smaller contiguous blocks to LIR / ISP)
! 2nd idea: forget the classes! (RFC 1518, 1519)
• CIDR (Classless Inter-Domain Routing)
• Allocate remaining addresses by variable block size, depending on the needs
• Try to get back already allocated prefixes to allocate them again, according to these (new) rules
! De-centralized organization
• IANA manages the global pool of addresses
• Regional centers receive blocks and re-distribute them to ISPs
ULA - May 2011page TCP / IP network78
CIDR : principle
!Address => préfix + prefix length
• No more classes: the first bits of the address do not mean anything from now on
!Administrative instances hierarchy
• Coordination: IANA (Internet Assigned Numbers Authority)
• Regional Internet Registries (RIR) (and then LIR / ISP)
ULA - May 2011page TCP / IP network79
Prefixes allocation
RIPE-NCC
[Toutain, 2003]
62/880/7
193/8194/7
IANA
ISP 162.125/16
ISP 2195.44/14
Site 162.125.44.128/25
Site 262.125.50/24
Site 3195.46.216/21
ISP 3195.47/16
Site 4195.47.172/22
ULA - May 2011page TCP / IP network
Today view of available addresses... none!
80
!Check the daily report• http://www.potaroo.net/tools/ipv4/index.html
Exhaustion of IANA pool: Feb-2011
RIR Unallocated Address
Pool Exhaustion
15-Apr-2011
The end of Internet??
ULA - May 2011page TCP / IP network
Did you forget about IPv6???
81
!A new version of the Internet protocol• Benefit from the long experience of the Internet
!An enabler!2001:660:3003:1:0:0:6543:210F!Longer, much longer addresses: 128 bits
• 3.41038 addresses
• 60 000 trillion trillion addresses per inhabitant of earth
• An address for every grain of sand in the world
So what is IPv6?
ULA - May 2011page TCP / IP network
IPv6 address format
!A 16 bytes Global IPv6 address:
• 2001:0660:3003:0001:0000:0000:6543:210F
• Hexadecimal notation: 2 bytes separated by a semi-column
!Compact form
82
2001:0660:3003:0001:0000:0000:6543:210F
ULA - May 2011page TCP / IP network
IPv6 address format
!A 16 bytes Global IPv6 address:
• 2001:0660:3003:0001:0000:0000:6543:210F
• Hexadecimal notation: 2 bytes separated by a semi-column
!Compact form
!Remove 0 on the left of each word
83
2001:0660:3003:0001:0000:0000:6543:210F
ULA - May 2011page TCP / IP network
IPv6 address format
!A 16 bytes Global IPv6 address:
• 2001:0660:3003:0001:0000:0000:6543:210F
• Hexadecimal notation: 2 bytes separated by a semi-column
!Compact form
!Remove 0 on the left of each word
84
2001:660:3003:1:0:0:6543:210F
ULA - May 2011page TCP / IP network
IPv6 address format
!A 16 bytes Global IPv6 address:
• 2001:0660:3003:0001:0000:0000:6543:210F
• Hexadecimal notation: 2 bytes separated by a semi-column
!Compact form
!Remove 0 on the left of each word!To avoid ambiguity, substitute only one sequence of
zeros by ::
85
2001:660:3003:1:0:0:6543:210F
ULA - May 2011page TCP / IP network
IPv6 address format
!A 16 bytes Global IPv6 address:
• 2001:0660:3003:0001:0000:0000:6543:210F
• Hexadecimal notation: 2 bytes separated by a semi-column
!Compact form
!Remove 0 on the left of each word!To avoid ambiguity, substitute only one sequence of
zeros by ::
• An IPv4 address may also appear like ::FFFF:123.12.34.56
86
2001:660:3003:1::6543:210F
ULA - May 2011page TCP / IP network
IPv6 address scheme
!RFC 4291 defines current IPv6 addresses
• loopback (::1)
• link local (fe80::/10)
• global unicast (2000::/3)
• multicast (ff00::/8)
!Use CIDR principles:
• Prefix / Prefix length
• 2001:660:3003::/48
• 2001:660:3003:2:a00:20ff:fe18:964c/64
!Interfaces have several IPv6 addresses
• At least link local and global unicast addresses
87
ULA - May 2011page TCP / IP network
IPv6 address format
!Global unicast address
!Link local address
88
ULA - May 2011page TCP / IP network
Ver.
Hop LimitPayload length
Flow label
Next Header
Source Address
Destination Address
40 B
yte
s
5 w
ord
s
32 bits
Traffic Class
IPv6 header
89
Address configuration
ULA - May 2011page TCP / IP network
What, how, who?
! What do you need?
• An address => how to get it? (dependent from your point of attachment)
• The router address
• A DNS server
! 3 ways
• Manual (not doable!)
• Automatic
• Dynamic
! IPv4
• DHCP
! IPv6
• Neighbor Discovery - address autoconfiguration
• DHCP
91
ULA - May 2011page TCP / IP network92
Auto-configuration: DHCP
Ethernet / IEEE 802.3
IPARP ICMP
UDPTCP
pingDHCP
Transport
Network
Link
Application traceroute
ULA - May 2011page TCP / IP network
DHCP (Dynamic Host Configuration Protocol)
!Dynamic Host Configuration Protocol- RFC 1541, RFC 2131 / 2132- Derived from an older protocol called BOOTP
• Functions- IP address allocation- IP stack configuration- Various parameters initialisation- Retrieve a boot file
• Works on top of UDP
93
ULA - May 2011page TCP / IP network
IP address allocation
Static Dynamic
Manual Automatic Dynamic
DHCPv4Allocation d’adresse
!Three types of address allocations
• Automatic- An IP address is selected in a pool, and permanently allocated to a client
• Manual- The administrator uses a configuration file- Match an IP address with a MAC address- May be used for security reasons
• Dynamic- The address is selected from a pool- For a given period of time- Method used by ISP
94
ULA - May 2011page TCP / IP network
DHCPv4 How does it work?
95
AR
LAN
95.7.8.0/23
95.7.8.1
AddrIP: 95.7.9.45Netmask: 255.255.254.0GW: 95.7.8.1
WINS addr, …
1. (0.0.0.0, 255.255.255.255, DISCOVER)
3. (0.0.0.0, 255.255.255.255, REQUEST)
!"#$%&'()#%*+,)#-../01!"$%&'2)#%*+,!)#-../01
3"#$%&'()#%*+,)#2451
6"#$%*+,)#%&'()#0/7/28/1
DHCP Client DHCP Server BDHCP Server A
ULA - May 2011page TCP / IP network
DHCPv4Messages
!Requests from clients
• DISCOVER (1)- Ask for an address allocation- List of the requested parameters by the client ( domain name, network mask, DNS, etc)
• REQUEST (3)- Response to the OFFER message or renewing of an allocation- Non-chosen server release the selected parameters by the client
• DECLINE (4)- Indicate that the current address is already in use
• RELEASE (7)- Free an IP address
96
ULA - May 2011page TCP / IP network
DHCPv4Les messages
!Responses from a server• OFFER (2)
- Response to a DISCOVER message- include the first parameters
• ACK (5)- Acknowledgement, include parameters and the IP address allocated to the client
• NAK (6)- Informs the end of an allocation, or bad parameters
!The message type is coded as an option• option 53
97
ULA - May 2011page TCP / IP network
What else?
98
95.7.8.1
95.7.9.45
95.7.8.0/23
95.7.9.40
&'#9:;<+=
IP source: 95.7.9.45IP destination: 95.7.9.40
DataMAC source: 00:26:08:e1:52:c5MAC destination: ??
IP headerEthernet header
ULA - May 2011page TCP / IP network99
Address Resolution Protocol
Ethernet / IEEE 802.3
IPARP ICMP
UDPTCP
pingDHCP
Transport
Network
Link
Application traceroute
ULA - May 2011page TCP / IP network100
ARP
! Address Resolution Protocol (RFC 826)
! Matching between network address (IP) ' MAC address
• Applications only handle IP addresses
• Frames are exchanged using MAC address
! We need to know the destination MAC address to send a frame
! May need to use it for duplication detection! Dynamic cache
• Built and updated by the system
• Each line has a finite lifetime
morrocoy[15:24]% arp -a
default-gw.irisa.fr (131.254.1.1) at 0:4:80:13:69:0
air.irisa.fr (131.254.60.130) at 8:0:20:89:58:95
sky.irisa.fr (131.254.60.147) at 8:0:20:ac:44:3
cuvert1.irisa.fr (131.254.70.14) at 8:0:11:13:99:e5
ULA - May 2011page TCP / IP network101
How does it work?
!If the destination address is not known (not in the table) ! issue an ARP request: Ethernet frame in broadcast
SenderDestination
(IP = (, MAC = $)
ARP request(broadcast)
ARP Response(point to point)
Who has IP = ( ?
That’s me (MAC @ = $)
ULA - May 2011page TCP / IP network102
ARP - duplication detection (1/2)
!When I configure the IP address! ARP request: broadcast an Ethernet frame
sourceDevice with an address collision
(IP = (, MAC = $)
ARP request(broadcast)
ARP response(point to point)
Who has IP = ( ?
That’s me (MAC address = $)
Configuration of the IP address = (
!
The address is already in use
ULA - May 2011page TCP / IP network103
ARP - duplication detection (1/2)
!When I configure the IP address! ARP request: broadcast an Ethernet frame
source
ARP Request(broadcast)
Who as IP = ( ?
Configuration of the IP address = (
Ok. I can use the address
OK
1 s
ULA - May 2011page TCP / IP network
IPv6 address configuration
104
ULA - May 2011page TCP / IP network
Addressing scheme
!RFC 4291 defines current IPv6 addresses
• loopbak (::1)
• link local (fe80::/10)
• global unicast (2000::/3)
• multicast (ff0::/8)
!Use CIDR principles
• Prefix / Prefix length notation
• 2001:660:3003::/48
• 2001:660:3003:2:a00:20ff:fe18:964c/64
! Interfaces have several IPv6 addresses
• At least a link local and a global unicast address
105
ULA - May 2011page TCP / IP network
An IPv6 address - 128 bits
106
!Global unicast address
!Link local address
2001:660:3003:1:34CA:3B73:6543:210F /64
ULA - May 2011page TCP / IP network
Let’s assume you want to configure it yourself
!You need:
• Prefix information
• Interface ID
107
ULA - May 2011page TCP / IP network
Interface ID assignment
! Derived from a L2 ID (i.e., MAC address)! Manually assigned
• To keep the same address when Ethernet card or host is changed
• To easily remember the address- 1,2,3...- Last digit of the v4 address
! Random value
• Change frequently (e.g., every day, per session, at reboot)
• Guaranty anonymity
! Hash or other value• to link the address to other properties
- public key- list of assigned prefixes
• Mainly for security purpose
108
ULA - May 2011page TCP / IP network
Neighbor Discovery
!IETF protocols
• RFC 4861 : Neighbor Discovery for IPv6
• RFC 4862 : IPv6 Stateless Address Configuration
• RFC 4135 : Goals of detecting network attachment
!Mechanisms
• Router discovery
• Prefix discovery
• Address resolution
• Address Auto-configuration
109
ULA - May 2011page TCP / IP network
Router Advertisement
!Periodic message sent by routers!Message sent in response to a Solicitation! ICMP message
Router Advertisement
110
ULA - May 2011page TCP / IP network
Possible Options
111
! Source Link Layer address
• Source link layer address of the router
! MTU
• Indicates the recommended size for the MTU on the link
! Prefix Information
• Indicates the prefix(es) used on the link
ULA - May 2011page TCP / IP network
Router and prefix discovery
! Identification of the prefix(es) used on the link
• Determine the destinations that are on-link
• Prefix used for address auto-configuration
! Identification of the on-link router(s) and default router!The information from a router is only a part of the information of
the link
• The reception of an RA must not erase the previous configuration
112
ULA - May 2011page TCP / IP network
Receipt of a RA
!The node adds the source address of the RA in its router list! If the Source Link Layer is set, the node registers the
association between the link local address and the link layer address in the Neighbor Cache
!Prefix option
• If the OnLink flag is set, it means that the prefix is used on the link
• Add this prefix in the list of prefixes
113
ULA - May 2011page TCP / IP network
Next hop determination
! Usage of a set of tables
• Destination cache
• Prefix list
• Default router list
• Neighbor cache
! Algorithm
• Look for the destination address in the Destination Cache
• Compare the destination IP address with prefixes => longest prefix match
- If the destination is on-link, the destination address remains unchanged
- If the destination is not on-link, the node selects a router as a next hop
• Determination of the link layer address
• Memorization of the choice in the Destination Cache
114
ULA - May 2011page TCP / IP network
Link layer address determination
!Use of the following messages
• Neighbor Advertisement
• Neighbor Solicitation
!Assumption: The node knows the destination IP address, and looks for the link layer address
• Send a Neighbor Solicitation
• Receive a Neighbor Advertisement form the target destination
• Record the correspondence between the IP address and the link layer address in the Neighbor Cache
115
ULA - May 2011page TCP / IP network
Auto-configuration
! Goal: create a valid address for a given link, without human intervention
! Several steps
• Create a link local address by concatenation of the link local prefix and the link layer address of the interface- Prefix : fe80::/64- Link layer address: 00:16:cb:b9:50:bd- EUI-64 identifier of the interface: 02:16:cb:ff:fe:b9:50:bd- Generated address: fe80::216:cbff:feb9:50bd
• Check the address unicity - Duplicate Address Detection- Send a Neighbor Solicitation to the destination address that has been created
• Reception of the prefix sent by the router (Router Advertisement)
• Creation of the global IP address by concatenation of the prefix with the link layer address- Prefix: 2001:660:7301:d170 / 64- Link layer address: 00:16:cb:b9:50:bd- Global address: 2001:660:7301:d170:216:cbff:feb9:50bd
• Optional check of the unicity of the global IP address
116
ULA - May 2011page TCP / IP network
Link layer address - EUI-64 Conversion
117
ULA - May 2011page TCP / IP network
Neighbor Unreachability Detection
!Check wether a device is reachable
• In case of mobility, check that the previous router is still on-link
!Send a Neighbor Solicitation to the target IP address
118
ULA - May 2011page TCP / IP network
Stateless auto-configuration
119
Time T=0: Router is configured with a link-local address and manually configured with a global address ("::/64 is given by the network manager)
FE80::IID1"::IID1/64
FE80::IID2"::IID2/64
t=0
ULA - May 2011page TCP / IP network
Stateless auto-configuration
120
Host attaches to the link and builds its link local address based on the interface MAC address
FE80::IID1"::IID1/64
FE80::IID2"::IID2/64
t=1
t=0 - Router is configured
ULA - May 2011page TCP / IP network
Stateless auto-configuration
121
Host does a DAD (i.e., sends a Neighbor Solicitation to query resolution of its own address: no answer means no other host has this value)
No answer after a timeout => ok
FE80::IID1"::IID1/64
FE80::IID2"::IID2/64
t=2
t=1 - Link local address configuration
t=0 - Router is configured
Neigbor Solicitation (FE80::ID2)
ULA - May 2011page TCP / IP network
Stateless auto-configuration
122
Host sends a Router Solicitation to the All Router Multicast group using the newly link-local configured address
FE80::IID1"::IID1/64
FE80::IID2"::IID2/64
t=0 - Router is configured
t=3
t=2 - DAD on link local address
t=1 - Link local address configuration
Router Solicitation
ULA - May 2011page TCP / IP network
Stateless auto-configuration
123
Router directly answers to the host using Link-local addresses. The answer may contain a / several prefix(es). Router can also mandate hosts to use DHCPv6 to obtain prefixes (state full auto-configuration) and / or other parameters (DNS servers, ...).
FE80::IID1"::IID1/64
FE80::IID2"::IID2/64
t=1 - Link local address configuration
t=0 - Router is configured
t=4
t=3 - DAD on global address
t=2 - Request for a Router Advertisement
Router Advertisement (")
ULA - May 2011page TCP / IP network
Stateless auto-configuration
124
Host performs a DAD (i.e., sends a Neighbor Solicitation to query resolution of its own global address: no answer means no other host has this value)
FE80::IID1"::IID1/64
FE80::IID2"::IID2/64
t=2 - DAD on link local address
t=0 - Router is configured
t=1 - Link local address configuration
t=5
t=4 - Receives a Router Advertisement
t=3 - Request for a Router Advertisement
No answer after a timeout => ok
Neigbor Solicitation ("::ID2)
ULA - May 2011page TCP / IP network
Stateless auto-configuration
125
Host sets the global address and configures the answering router as the default router
FE80::IID1"::IID1/64
FE80::IID2"::IID2/64
t=3 - Request for a Router Advertisement
t=1 - Link local address configuration
t=0 - router is configured
t=2 - DAD on link local address
t=6
t=5 - DAD on global address
t=4 - Receives a Router Advertisement
ULA - May 2011page TCP / IP network
In summary... Neighbor Discovery
!Determine link layer address of their neighbors!Address auto-configuration (statefull and stateless)
• Layer 3 parameters (IPv6 address, default route, MTU and Hop limits)
!Duplication Address Detection (DAD)!Maintain neighbor reachability
126
The transportlayer
ULA - May 2011page TCP / IP network128
Let’s start with User Datagram Protocol
Ethernet / IEEE 802.3
IPARP ICMP
UDPTCP
pingDHCP
Transport
Réseau
Liaison
Application traceroute
ULA - May 2011page TCP / IP network129
UDP
!User Datagram Protocol (RFC 768)!Applications/protocols using UDP
• NFS (Network File System)
• DNS (Domain Name System)
• DHCP (Dynamic Host Configuration Protocol)
• Multimedia applications (interactive voice and video, streaming)
ULA - May 2011page TCP / IP network130
What is proposed by UDP
!Minimal support at the transport level
• No retransmission of lost packets, no flow control, no congestion control
!A datagram service
• No connection
• Non-reliable- Reliability may be implemented at the application
ULA - May 2011page TCP / IP network131
UDP datagrams and IP packets
!Information unit (PDU) = datagram!No segmentation
• When an application generate a data, it is encapsulated in a UDP datagram
• Potentiel risk of fragmentation from the IP layer
DataUDP headerIP header
IP Packet
UDP datagram
ULA - May 2011page TCP / IP network132
UDP header
3116150
32 bits
Source port number Destination port number
Length Checksum
Data (if any)Data (if any)
8 b
ytes
!Port number: demultiplexing
ULA - May 2011page TCP / IP network133
Checksum
!UDP Checksum: optionnal (mandatory in TCP)• if not present: checksum = 0
!Method (same as TCP)• Sum of 16 bits words and then sum each element with 1 (ci + 1, i=0..15)
• Header + data + IP « pseudo header »
3116150
Source IP addressSource IP addressSource IP address
Destination IP addressDestination IP addressDestination IP address
0 Protocol (= 17) Length
Source port numberSource port number Destination port number
LengthLength Checksum
DataDataData
0
Pseudo
header
UDP
header
Padding (only for the calculus)
ULA - May 2011page TCP / IP network134
Plan : Transport Control Protocol
Ethernet / IEEE 802.3
IPARP ICMP
UDPTCP
pingDHCP
Transport
Network
Link
Application traceroute
ULA - May 2011page TCP / IP network135
Outline
!TCP characteristics!TCP Connection!Flow control!Congestion control
ULA - May 2011page TCP / IP network136
Introduction to TCP
!Examples of applications / protocols using TCP• HTTP (web), SMTP / POP / IMAP (e-mail), FTP, telnet, ssh...
• Youtube !
!Proportion de trafic TCP dans l’Internet (McCreary et Claffy, 2000)• 91% of bytes
• 83% of packets
!Documents from the IETF• RFC 793 (Base specification)
• RFC 1122 (Requirements for Internet Hosts)
• RFC 2581 (Congestion Control)
• RFC 2988 (Retransmission Timer)
• RFC 1323 (Extensions for High Performance)
• RFC 2018, 2883 (Selective Acknowledgment)
• …
!Standard de facto: BSD implementation• Several thousands of line of codes
ULA - May 2011page TCP / IP network137
What is TCP providing to the applications?
!A transparent, without errors, bidirectional channel that transport a sequence of bytes
• End-to-end protocol
• Reliable
• Byte-stream
• Connection-oriented protocol
ULA - May 2011page TCP / IP network
138
End-to-end
IP
TCP
web server
IP
TCP
web client
PPP
IP
routeur (nœud)
End-host End-host
TCP (end-to-end)
http (application layer)
IP protocol IP protocol
Ethernet
Protocol
PPP
protocolEthernet Ether. PPP
serial linkLAN Ethernet
ULA - May 2011page TCP / IP network139
Reliable service
!TCP makes the assumption that the network is not reliable• Acknowledgement of the received data
• Retransmission of the lost data
• Flow control
• Re-sequence data that are coming in the wrong order
• Discard of the duplicated data
• Check data integrity (checksum)
ULA - May 2011page TCP / IP network140
Byte stream
!TCP transports bytes
• Not structured flow: TCP may arbitrarily segment data- TCP does not provide packet logic to upper layer, it’s up to the application
• For each transmitted byte, a sequence number is used- Allows detecting loss and giving numbers to Ack
ULA - May 2011page TCP / IP network141
TCP segments and IP datagrams
!Information unit (PDU) = segment
• Application data are split into blocks, transported as TCP segments
• Each TCP segment is encapsulated in an IP datagram
DataTCP headerIP header
IP datagram
TCP segment
ULA - May 2011page TCP / IP network142
TCP header7 8 9 153 4 3110 160
Res. Flags
Data (if any)
TCP options (if any)
Pointer to urgent dataChecksum
Size of the windowLength
Acknowledgment number (ACK)
Sequence number
Destination port numberSource port number
20bytes
ECN
ULA - May 2011page TCP / IP network143
Port numbers
!Well known• from 0 to 1023
• Used for usual services- Examples : telnet serveur (23), ssh (22), http (80)...
!reserved• From 1024 to 49151
- Example : Quake (26000), SIP (5060)
!« Temporary »• from 49152 to 65535
• Dynamically allocated by the application- Typically, a client, such as http client or ssh client
ULA - May 2011page TCP / IP network144
Flags from the TCP header
The field pointer to urgent data is validURG
The TCP receiver must quickly send data to the upper layers
PSH
Reset the connectionRST
The field Ack Number indicates a correct valueACK
The TCP sender does not have more data to sendFIN
Synchronized the sequence number at both ends of the connection
SYN
!Used for signaling
ULA - May 2011page TCP / IP network
Sequence number and acknowledgement
!Sequence number = first byte of data in the segment
!Ack number = next byte that the sender is ready to receive
!Connexion full-duplex
• A sequence number for each direction
A BSequence number = XN bytes of data
[X, X+N–1]
ACKnumber = X+N
Sequence number = X+NM bytes of data
[X+N, X+N+M –1]
ULA - May 2011page TCP / IP network146
Ack mechanism
!Positive acknowledgement• The received confirms what it receives in sequence
- Does not spontaneously notify that data is missing- Does not explicitly indicate what is missing
!« Accumulative » property• An ACK may acknowledge more than one received segment
• Delayed ACKs mechanism : send e.g. 1 ACK every other segment
[X, X+N-1]
[X+N, X+N+M-1]
ACK X ACKX+N+M
[Y, Y+J-1]
[Y+J, Y+J+K-1]
ACKY+J+K
Positive confirmation accumulation
ULA - May 2011page TCP / IP network147
Retransmission and acknowledgement of TCP segments
writing : N bytes
out of sequence data (buffered)
M bytes
seq = X
ACK X
seq = X+N
seq = X+N+M
ACK X
K bytes
retransmission seq = X
ACK X+N+M+K+1
out of sequence data (buffered)
N+M+K bytes sent to the application (in the right order)
RTO
!Reliable service = retransmission if loss
ULA - May 2011page TCP / IP network148
Outline
!TCP characteristics!Connection
• Opening and closing
!Flow control!Congestion control
•
ULA - May 2011page TCP / IP network149
Connection set up and tear down
!Before sending data, a connection establishment is needed
• Signaling : three-way handshake
!Typical phases of a TCP connection
• Establishment
• Data exchange
• Closing
ULA - May 2011page TCP / IP network150
Opening the connection
client serverapplication : passive openingapplication:
active opening SYN j
SYN k, ACK j+1
ACK k+1
connection established
connection established
Three segments to open the connection: three-way handshake
ULA - May 2011page TCP / IP network151
Closing the connection
client server
application : active closing
FIN m
ACK m+1
ACK n+1
notify the closing to the application
FIN napplication : passive closing
notify the closing to the
application
Four segments to close the connection
ULA - May 2011page TCP / IP network152
Outline
!TCP characteristics!Connection!Flow control
• Sliding window
!Congestion control•
ULA - May 2011page TCP / IP network153
Flow control
!End-to-end!Objectives
• Avoid having the sender to send data too fast for the TCP receiver to receive
• Better exploit the network capacity
!How this is done?
• TCP controls the frequency at which segments are sent
ULA - May 2011page TCP / IP network154
Flow control in TCP!« Sliding window »
• Idea :- The sender can send data without having received an ack for what has been already sent
(efficient usage of the capacity) ...- ... as long as the sender is able to receive new segments (flow control)
• Principle: each TCP indicates the number of bytes it is ready to receive, from the ack number
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18Byte n°
Window indicated by the receiver
Sent and acknowledged
Sent but not acknowledged
May be sent without delay
Can not be sent yet
useful windowSender :
ULA - May 2011page TCP / IP network155
Sliding window: evolve in time
!The sender confirms the correction reception ! the window is closing• Header: Ack number (32 bits)
!The receiver reads data (acknowledged) ! the window is opening• Header: Window size (16 bits)
!Closing + opening = the window moves forward (slides)
sending window
closing opening
ULA - May 2011page TCP / IP network
BASYN 0
SYN 0, ACK 1, MSS = 1024, window = 40961
2ACK 1
3
[1, 1024]4
[1025, 2048]5
[2049, 3072]6
ACK 2050, window = 40967
ACK 3073, window = 30728
[3073, 4096]9
ACK 4097, window = 409610
[4097, 5120]11
[5121, 6144]12
[6145, 7168]13
ACK 6145, window = 409614
[7169, 8192]15
initial window
slide
closing
opening and slide
Opening of the connection
Data transfert
Sliding window: an example
156
ULA - May 2011page TCP / IP network157
Flow control with sliding window
!Intuitively :
• The more the bandwidth is, and/or
• The more the RTT is,
• Then : the largest the window must be to allow the sender to continuously send data
ULA - May 2011page TCP / IP network158
Outline
!TCP characteristics!Connection!Flow control!Congestion control
• Algorithms : Slow start, Congestion avoidance, Fast retransmission, Fast recovery
ULA - May 2011page TCP / IP network159
Congestion control! Packets are in the network
• On the link
• In the routers queues
! Example : intermediate link at low rate ; window = 20• Distance between ACKs : given by the slowest link
• if R1queue is full ! packets loss
Ro
ute
r R
1
sen
der
sen
der
bottleneck
Ro
ute
r R
2
rece
iver
rece
iver
data
ACKs
1 2
3 4 5 6
7 8
10 9
14 13 12 11
20 ... 15
[Stevens, 1994]
ULA - May 2011page TCP / IP network160
About congestion...
!The sliding window garanties that the receiver is not flooded
!Problem: what if the bottleneck is in the network, and not in the receiver??• Possible causes
- slow link-# (incoming flows) > capacity of the link
!(In principle…) We can not know the state of the intermediate nodes• Changing state
ULA - May 2011page TCP / IP network161
Congestion control in TCP
!Set up by the sender• Mechanisms
- Congestion detection- Reaction against detection
!(up to) Four algorithms work together• Slow start
• Congestion avoidance
• Fast retransmit
• Fast recovery
!Sending window: wnd = min( rwnd, cwnd )• rwnd := receiver window size
- Flow control by the receiver
• cwnd := Congestion window- Flow control by the sender
ULA - May 2011page TCP / IP network162
Congestion control algorithms
!Slow start• Start with precautions
- cwnd = 1 or 2 segments– At the beginning of the connection (we don’t know the status of the network!)
– When the timeout expires
• Use the Ack receiving rate to adapt the sending rate (self-clocking)- Network with low charge
– Quick response from the network- Loaded network
– ACKs take more time to arrive ! slow start
!Congestion avoidance• Once we reached the congestion point, (try to) avoid to reach it again
• But, still trying to efficiently use the bandwidth
ULA - May 2011page TCP / IP network163
Slow start
!Algorithm• With each new received ACK: cwnd ) cwnd + 1
!Without losses, cwnd increases (quasi)-exponentially• Growth rate: (more or less) % 2 per RTT
1
cwnd (in TCP segments)
2 3 4 5 6 7 8 9 ... 16
ULA - May 2011page TCP / IP network164
cwnd evolution: school case
cwnd
time (RTTs)
1
Network congestion
ssthresh
RTO intervalslow
start
fast recovery +
congestion
avoidance
fast recovery +
congestion
avoidance
slow start +
cong. avoid.