The Rapid Fire Survey of IP / UDP / TCP

The Rapid Fire Survey ofIP / UDP / TCP

Dirk GrunwaldAssoc. Professor

Dept. of Computer ScienceUniversity of Colorado, Boulder

Review

IP (Internet protocol) is designed to connect networks that are

Possibly managed by multiple organizations / people May have different physical connections May be connected via a sequence of arbitrary intermediaries

A layered approach is used simplify application & protocol design

Protocol Layering

FTP

TCP

IP

Ethernet

FTP

TCP

IP

Token RingIP

Review

The link layer deals with the actual transport of bits across a physical medium.

The network layer abstracts the characteristics of the different link layers to a common layer (e.g. IP) and provides management functions at that layer.

The transport layer adds various features: Reliable communication (tcp) Arbitrary message sizes (udp)

The application layer is the API provided to the programmer. Protocols are defined above that.

Problems to identify & solve

Addressing How do we “name” applications? How do we “name” connections? How do we “name” computers?

• For humans• Across networks• Within a physical network

How do we deal with a decentralized organization? Who arbitrates decisions? Who defines standards?

How do we deal with a plurality of physical networks?

Naming & Addresses

Addresses are defined across three layers

Physical / link level Medium Access Control (MAC)

Network/IP level IP address

Transport/application level Ports

Media Access and Control

Media can be arbitrated or be susceptible to collision Arbitrated – Token Ring or 802.11 in PCF mode Collision – Ethernet, 802.11 in adhoc mode

A “Collision domain” includes all the nodes that may be affected by a collision

Hubs & Switches

A hub is a single collision domain, although it has a physical “hub and spoke” topology

A switch is a set of distinct collision domains.

Frames destined for another collision domain are “switched” from one domain to another

Addressing at the physical layer

“Ethernet” (or 802.3) networks specify a 48-bit physical “MAC address”

00-00-f8-75-5b-a6 -- Unique identifier for the network interface card (NIC)

Address ranges are assigned to specific vendorsE.g., “00-00” is Digital Equipment corp.

Certain MAC addresses mean “broadcast”

Addressing at the physical/link layer

Frames are “delivered” to NIC’s with that specific MAC address (or all w/broadcast)

A hub presents each frame to all NIC’s

A switch moves frames from one collision domain to another based on the MAC address

A table is maintained that specifies which MAC addresses are on which collision domain.

Frames destined for an unknown MAC address are broadcast to all collision domains

The reality of the world today

A 10-BaseT ethernet NIC runs ~$9 for a cheapo PCI/ISA10-BaseT via USB is ~$40. 100BaseT via PCI is $30.Gigabit NIC is ~$350.

A 4-port hub costs $40. Switches are >$70. Gigabit is much more (>$2000).

More Realities

Single nodes on switches allow you to use duplex communication

Send & receive concurrently

You need to use high-qualitycabling (“Cat5”) for100 Mb/s networks

Gigabit networks currently require fiber, but cable standard now available.

Modest network bandwidth & contention is a problem you throw money at, not brains.

NIC

TokenRing / FDDI

A “token” circulates amoung all computers.You can only transmit if you have the token.

Variations: More than one tokenbased on lengthor e.g. WDM or FDM.

More Addressing

So, at the physical layer, Ethernet/802.3 uses a MAC address

Can locate computers within a single physical network You want to limit network size - broadcast packets still affect

full network.

How do you address at the network and transport level?

IP Addressing

Each host in the internet has a unique 32-bit address I’m lying

There are three address types Unicast communication -- destined for a single host Broadcast communication -- destined for all hosts on a network Multicast communication -- destined for a set of hosts that

belong to a multicast group.

Note the use of “network” and “host” Network ID’s are assigned by the InterNIC

IP Addressing

01

11

netid/7

11 1 1

11 01

netid/14netid/21 hostid/8

hostid/160hostid/24

multicast group/280multicast group/28

Class AClass BClass CClass EClass F

Class Range (as “dotted quad”)A 0.0.0.0 to 127.255.255.255B 128.0.0.0 to 191.255.255.255C 192.0.0.0 to 223.255.255.255D 224.0.0.0 to 239.255.255.255E 240.0.0.0 to 255.255.255.255

Problems & Subnets

A few companies got class A networks(e.g., Digital, Xerox)

Many educational institutions got class B networksE.g., my primary computer is 128.138.241.78

Most people get class C networks. E.g., my cable modem in Palo Alto was 208.166.41.96

Allegedly, broadcasts would go to an entire network Obviously impractical for a Class A network.

That’s 16,777,216 hosts We’ll discuss subnetting and routing later

Mapping names to numbers

Obviously, it’s hard to remember that 128.138.241.78 is my computer

But, numbers are more useful when actually switching messages

The Domain Naming System maps names to IP addresses A tree-structured distributed database and naming scheme Each separately administered subtree is a “zone” Network Solutions handles registration of each “top level

domain” (e.g., colorado.edu). Sub-domains are then administered by individual groups

• cs.colorado.edu We’ll discuss how names are “resolved” later

Transport Level Naming

Each NIC receives messages for a number of applications How do we differentiate the data intended for different apps?

Each IP connection has an associated 16-bit port number. Port numbers are contained in each TCP & UDP packet

Some port numbers are “well known services” E.g., telnet is always port number 23 Port numbers from 0..1023 are for well known services.

Those port numbers are assigned by the Internet Assigned Numbers Authority (IANA)

Transport Naming in Unix

Unix uses “reserved ports” for security Only the superuser can create ports in the range of 0..1023. This is used for simplistic authentication

On most unix systems, /etc/services lists the reserved ports

systat 11/tcp usersdaytime 13/tcpdaytime 13/udpnetstat 15/tcpqotd 17/tcp quote textchargen 19/tcp ttytst sourcechargen 19/udp ttytst sourceftp-data 20/tcpftp 21/tcpssh 22/tcp # SSH Remote Login Serverssh 22/udp # SSH Remote Login Server

Representing TCP & UP

UDP is a “datagram” or “message” oriented protocol Maps well to Ethernet, etc

TCP is a “stream oriented” Appears to be an infinite stream of bytes This maps to frames by “packetization”

IP PacketIP PacketIP PacketIP PacketIP Packet

Encapsulation

Application level communication typically has three levels of addressing

Application information (e.g., HTML headers) Transport information (port) Network information (IP address) Link information (MAC address)

Each layer is “encapsulated” in the preceding layer. We “mux” or encapsulate the message when it’s sent We “demultiplex” the message when it arrives

Leads to layered software design

Encapsulation as it goes down the “protocol stack”

User Data

User Data

User Data

User Data

App Hdr

App Hdr

App Hdr

TCP Hdr

TCP HdrIP Hdr

User DataApp HdrTCP HdrIP HdrEthernetHeader

14 20 20

Ethernettrailer

446-1500 bytes

App.

TCP

IP

Ethernet

Demultiplexing

Ethernetdriver

ARP IP RARP

TCP UDPICMP IGMP

App App App App

EthernetFrame

IPHeader

TCP/UDPHeader

Other

Standards Bodies

Lots of arbitrary constants here! Naming, IP assignment, protocol & header formats, etc

Largely “volunteer” organization Internet Society -- "We are the most public secret cabal in the

history of the world." - Jon Postel Internet Architecture Board (IAB) - technical oversight &

coordination body Internet Engineering Task Force (IETF) - near-term, standards-

oriented. Develops specifications that become internet standards

Internet Research Task Force (IRTF) - R&D arm

Standards are embodied by RFC’s

Request for Comment (RFC)

Unique monotoniclly assigned numbers. RFC’s can not be revised, only re-issued.

All RFC’s are available on-line www.faqs.org has nice searchable index www.ietf.org has information on drafts and working groups

http://www.faqs.org/

http://www.ietf.org/

Standards

Ethernet defined by Digital, Xerox and Intel

Later, the IEEE published a different set of standards http://grouper.ieee.org/groups/802/ 802 defines a “logical link control” common to all 802 nets 802.3 covers many CSMA/CD networks 802.4 covers token bus networks 802.5 covers token ring networks 802.11 covers wireless ethernet

Standards

In the IP world, RFC 894 defines IP-in-ethernet RFC 1042 defines IP-in-802

The host requirements RFC says that all hosts connected to 10-Mbit Ethernet cable should

Be able to send/receive using RFC 894 Be able to send/receive a mix of RFC 1042 and 894 packets May be able to send packets using RFC 1042. If either can be

sent, you must default to 894 packets

Ethernet & 802.3 Encapsulation

Destination MAC or hardware address Each NIC has a unique hardware address

Source MAC or hardware address Protocol type to allow sharing the same physical media

with several different protocols Type fields are defined by RFC 1700, which makes RFC 1340

obsolete

Some data A checksum

Ethernet Encapsulation (RFC 894)

...DestAddr.

SrcAddr.

Type CRC

66 2 46-1500 bytes

Payload

0800 IP Datagram

0806 ARP request/reply PAD

8035 RARP request/reply PAD

46-1500 bytes

28 bytes 18 bytes

4

Variations

Observation Ethernet MAC information is fixed and can be pre-computed Data is typically fixed size Other fields (IP and TCP headers) can vary in size and also have

CRC fields for end-to-end IP checksums

RFC 893 describes “trailer encapsulation” The IP and TCP headers move to the end of the frame Helps in computing IP checksum Allows more efficient use of scatter/gather DMA hardware

802.3 Encapsulation

Explicit length - number of bytes up to but not including the CRC

802.2 LLC - link layer control common to all 802 networks and needed for e.g. wireless communication

DSAP - desination service access point (0xaa) SSAP - source service access point (0xaa) Control field is set to 3

802.2 SNAP - sub-network access protocol Fixed origin code (0) Type field, as in the Ethernet type field

802.3 Encapsulation

...DestAddr.

SrcAddr.

OrigCode

Lth

Type CRC

SSAP AA

DSAP AA

Control

802.3 MAC

802.2LLC

802.2SNAP

Payload has same formatas Ethernet encapsulation

SLIP - Serial Line IP

Specified in RFC 1055 IP datagram is terminated by the special END (0xc0)

character. Most implementations transmit END at the start as well.

If a byte in the IP datagram contains END, the 2 byte sequence 0xdb, 0xdc is transmitted (byte stuffing).

0xdb is the SLIP escape (ESC) character.

If a byte in the IP datagram equals the SLIP ESC, the 2 byte sequence 0xdb, 0xdd is transmitted

SLIP Encapsulation w/Byte Stuffing

C0 DB

DB DC DB DD C0

IP Datagram

Problems with SLIP

Each endpoint must know the IP address of the other endpoint.

There’s no TYPE field -- thus, SLIP only supports a single protocol

There’s no checksum - thus, all retransmissions are initiated by end-to-end re-transmissions

PPP - Point-to-Point Protocol

Encapsulate IP datagrams on a serial link A Link Control Protocol (LCP) to establish, configure and

test the data-link connection. This allows connection feature negotiation

A family of Network Control Protocols specific to different network layer protocols

IP OSI networks (X.25) DECnet AppleTalk

PPP Protocol

FLAG Addr Cntl Proto Payload CRC FLAG

Byte stuffing as in SLIP/CSLIP protocol Bytes with values less than 0x20 are also escaped to avoid

problems with flow-control

Most implementations can negotiate to eliminate ADDR and CNTL fields, reducing overhead to 1 byte.

MTU

Most link layers have a limit to the size of an IP datagram, or Message Transmission Unit (MTU)

If an IP datagram > MTU, then it is fragmented (Chap 11.5)

Network MTU (bytes)Hyperchannel 6553516Mb token ring 179144MB token ring 4464FDDI 4352Ethernet 1500IEEE 802.3 1492X.25 576PPP 296

Path MTU

Messages traverse a route or path through a network. The smallest MTU along that path is called the Path MTU.

Not always constant, since the route between two nodes in the network can vary

Also, routing isn’t a bijective relationship, and thus the A->B MTU may differ from the B->A MTU

RFC 1191 defines “path MTU discovery”, which is the process of automatically discovering the smallest MTU along a path.

Everyone does this

The IPv4 Protocol

IP is a “best effort connectionless” protocol It’s a datagram/packet oriented protocol You can get an IP packet from anyone without any

“setup” or “connection establishment” Packets are normally routed using destination routing

You specify where packet is to go, not how it gets there You can optionally specify source routing

You specify route for packet as part of the packet Each packet is routed independently

Can be delivered out of order Might not be delivered at all

Conventions in IPv4 - Network Byte Order

IP data is layed out in “Big Endian” Order

Byte transmission order: 0, 1, 2, 3

Representing a 16-bit integer in memory Big endian “0,1” - (SPARC, M68k) Little endian “1,0” - (x86, Alpha)

“Network byte order” is defined to be big endian

0 1 2 3

0 15 16 31

Conventions in IPv4

When we need to set fields in an IP header, we will need to use translation functions to be portable.

Actually, you need this for all binary fields

htons - host to network short (16-bit) port_number = htons (port_number);

ntohs - network to host short

htonl - host to network long (32-bit) htonl (interface_addr.get_ip_address ());

ntohl - network to host long

What’s Stored in an IPv4 Packet?

Version - 4 bit field specifying the IP version. Currently 4 Header length - specified in 32 bit words. Range is 5..15

words, or 20..60 bytes Type of Service (8 bits)

3 bit precedence field (ignored today), one “must be zero” field 4-bit field specifying desired service qualities.

• Minimize Delay• Maximum Throughput• Maximize Reliability• Minimize Monetary Cost

Only one bit can be set. None set is “normal service” Largely ignored by routers & IP implementations

What’s Stored in an IPv4 Packet

Message length, in bytes Datagram identification field that must be unique

Used with flags & fragment offset if a message must be fragmented

Time to live field - upper limit on the number of “hops” a message can go before being dropped

Protocol - identifies TCP, UDP, ICMP, etc Header checksum - checksum of just the TCP/IP header Source address Destination address Options

IPv4 Protocol Layout

Version Hdr Lth Type of Svc Total length (in bytes)

16-bit Packet Identification Flags Fragment Offset

Time To Live Protocol Header Checksum

Source IP Address

Destination IP Address

... (options, if any)...

Data

Parsing the IPv4 Packet

Data starts at “Total Length - Header Length” Maximum IP data gram is 65535 bytes

Hosts are not required to receive packets >576 bytes Ethernet MTU is 1540 bytes Most implementations allow for ~8192 byte IP datagrams

(because of Network File System)

IPv4 Options

Security & handling restrictions Have each router record its IP address Have each router record its IP address and timestamp Loose source routing - specify a list of IP addresses that

must be traversed by the packet Strict source routing - enforce that list

60-byte limit on IP headers limits utility of these options

We need to worry about source routing when we talk about AdHoc routing

Hop-by-Hop IP Routing

Datagram arrives For this host? => Deliver to TCP, UDP, etc Else => Lookup next hop in routing table

If there’s an entry, forward the message Else => discard the datagram

R1 R3

R2

SRC DST

Routing Tables

Routing table Contents Destination IP address (either HOST or NET address) IP address of the “next hop” router Flags (HOST/NET, Router/Direct) Network interface to use

Routing lookup: Search for an entry that matches the destination IP address

• Handles directly connected or point-to-point links Search for an entry that matches the destination network

• If found, send to the directly-connected router or interface Search for a default route

Problems in Routing

Remember, IP routing is decentralized. How are routing tables established?

You can specify static routes, I.e., hard-coded information about your local network

A default route is usually specified via a static route, but it’s not sufficient

Routers share their local information using dynamic routing protocols that propogate local information across a large network

R1 R3

R2

SRC DST

Example Host Routing

bsdi.13.35

sun.13.33

Enet, 140.252.13

defa

ult

defa

ult


bsdi.13.35

sun.13.33

Enet, 140.252.13

140.252.13.33

140.252.13.33Enet


bsdi.13.35

sun.13.33

Enet, 140.252.13

140.252.13.33Enet

140.252.13.33

Example with Router

bsdi.13.35

sun.13.33

Enet, 140.252.13

.1.29

netb

.1.183

gateway

.1.4

Enet, 140.252.1

Example with Router

bsdi.13.35

sun.13.33

Enet, 140.252.13

.1.29

netb

.1.183

gateway

.1.4

Enet, 140.252.1

192.252.1.183

Next = 140.252.13.33(default)

Example with Router

bsdi.13.35

sun.13.33

Enet, 140.252.13

.1.29

netb

.1.183

gateway

.1.4

Enet, 140.252.1

192.252.1.183

Next = 140.252.13.33(default)

192.252.1.183Enet

Example with Router

bsdi.13.35

sun.13.33

Enet, 140.252.13

.1.29

netb

.1.183

gateway

.1.4

Enet, 140.252.1

192.252.1.183

Next = 140.252.1.183(default)

192.252.1.183Enet

192.252.1.183

Example with Router

bsdi.13.35

sun.13.33

Enet, 140.252.13

.1.29

netb

.1.183

gateway

.1.4

Enet, 140.252.1

Next = 140.252.1.4(default)

192.252.1.183Enet

192.252.1.183

192.252.1.183

Example with Router

bsdi.13.35

sun.13.33

Enet, 140.252.13

.1.29

netb

.1.183

gateway

.1.4

Enet, 140.252.1

Next = 140.252.104.2(default)

192.252.1.183Enet

Key notes

All the hosts and routers used a default route The destination IP address never changed All routing decisions were made based on that routing

address Different link-layer encapsulation schemes were used as

the message went from Ethernet to CSLIP to Ethernet

Subnet Addressing

Routing is based on “networks” Routing for all nodes in network is handled by a single router Class B - single address routes traffic for 65536 addresses Class C - single address routes traffic for 256 addresses

Original “network” field unworkable for “network” like things since class A & B had too many bits devoted to “network” field

Hence, subnets - specified by RFC 950 Imposes logical ordering, allowing many networks of fewer

machines Hierarchical - still a single advertised router for a Class B

network

Common Subnetting Sizes

Subnetid HostidNetwork ID = 128.138

16 bits 8 bits 8 bits

Subnetid (241) Hostid (78)Network ID = 128.138


Why different sizes?

It’s possible to have networks span multiple physical media

VPN software

It’s possible to have multiple networks on a single physical media

The ideal goal is to have a single network (subnet) per physical media

All broadcast traffic is routed to that physical media, so many networks on the same media causes more traffic

More networks allows better “clustering” of network traffic

Subnet Masks - how are subnets specified?

Subnetid HostidNetwork ID = 128.138


Subnetid (241) Hostid (78)Network ID = 128.138


11111111 11111111 11111111 00000000 255.255.255.0

11111111 11111111 11111111 11 000000 255.255.255.192

Subnet mask has 1’s on left, zero’s on right Specifies which bits are the host id in an IP address Stevens Corrections

arbitrary bitmask not allowed Subnet zero can be used

Given an IP address...

Select router based on the IP address I.e., for Class B, use the upper 16-bits as a network

specification For class C, use the upper 24-bits as a network specification

Route to that network (using routing tables..) Then,

That router uses the pre-specified subnet mask to select a subnet

A subnet routing table is consulted and traffic is directed to that subnet

More hierarchical structure

Special IP Addresses

Special source addresses as part of an initialization procedure (e.g. bootp)

This host on this network NET = 0, HostID = 0

Specified host on this networkNET = 0, HostID = this host

Loopback addresses Loopback address - allows applications on same host to

communicate using TCP/IP NetID = 127, HostID = anything

Special IP Output Addresses

Limited Broadcast - Typically used for initialization Only appears on local cable/collision domain NETID = -1, HostID = -1

Net-directed Broadcast (to netid) Forwarded via router NETID = netid, HostID = -1

Subnet-directed Broadcast (to netid, subnetid) NETID = netid, SubnetID = subnetid, HostID = -1

All subnets-directed broadcast for netid Most routers don’t support this - use Multicast instead to do the

same thing NETID = netid, SubnetID = -1, HostID = -1

Loopback Devices

Loopback devices allow applications on the same host to talk to each other directly

No packet directed to the loopback device can appear on any physical network

Typical implementation results in… Loopback typically implemented as another network layer Everything sent to loopback (127.0.0.1) appears as IP input Datagrams sent to broadcast or multicast addresses are copied

to the loopback interface and also sent on Ethernet Anything send to one of the hosts own IP addresses is sent to

the loopback device

Loopback Devices

IP outputFunction

Place on IPinput queue

IP inputFunction

Destination IP addressequals broadcast address

or multicast address?

Place on IPinput queue

Destination IP addressequals interface address?

ARPDemultiplex based onEthernet frame type

LoopbackDriver

EthernetDriver

Yes

Yes

No, use ARP toget dest. Ethernet

address

IP

ARP

IFCONFIG - determining interface configurations on Unix systems

[foobar-39] ifconfig -a

tu0: flags=c22<BROADCAST,NOTRAILERS,MULTICAST,SIMPLEX>

tu1: flags=c63<UP,BROADCAST,NOTRAILERS,RUNNING,MULTICAST,SIMPLEX>

inet 128.138.241.78 netmask ffffffc0 broadcast 128.138.241.127 ipmtu 1500

sl0: flags=10<POINTOPOINT>

lo0: flags=100c89<UP,LOOPBACK,NOARP,MULTICAST,SIMPLEX,NOCHECKSUM>

inet 127.0.0.1 netmask ff000000 ipmtu 4096

Netstat - statistics & more

[foobar-40] netstat -in

Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll

tu0* 1500 <Link> 08:00:2b:e4:c1:8c 0 0 0 0 0

tu1 1500 <Link> 00:00:f8:00:a3:f2 165378215 0 155667063 37 2792801

tu1 1500 128.138.241.64/26 128.138.241.78 165378215 0 155667063 37 2792801

sl0* 296 <Link> 0 0 0 0 0

lo0 4096 <Link> 88293425 0 88293510 0 0

lo0 4096 127/8 127.0.0.1 88293425 0 88293510 0 0

[foobar-41]

ARP / RARP / ICMP

ARP is a protocol for mapping and IP address to a MAC address

RARP is a protocol for “managing” a machine -- telling a machine what it’s IP address should be, based on the MAC address

ICMP is the internet control message protocol and is used to manage (& measure) many aspects of IP

ARP - The Problem

Once a packet has been routed to a specific network, we need to deliver it to the appropriate host

The host Ethernet only listens to an ethernet MAC address

We only have an IP address

Thus, we need to know how to map the IP address to a MAC address

ARP - Example

FTP uses gethostbyname to determine the IP address of an FTP server

FTP asks TCP to establish a connection TCP send a connection request to that IP address, which

is on the local network The O/S uses ARP to determine the Ethernet MAC

address The destination O/S replies & the reply is received The IP layer can now send the packet

The sequence

ARP

EthernetDriver

EthernetDriver

ARP IP

TCP

EthernetDriver

ARP IP

TCP

FTPResolver

Format of an ARP request

EthenetDest. Address

EthernetSrc Address

SenderEnet Addr

SenderIP Addr

TargetEnet Addr

TargetIP Addr

HardwareType

ProtocolType

SizeType

HardwareSize

Notice this!Used by Proxy ARP

Notes

ARP uses a physical (Ethernet) broadcast to the network A unicast response is used to inform the sender of the

appropriate MAC address

ARP responses are cached by the kernel

Everyone listens to the Sender message and caches that response

You can use “arp” to see the ARP table

[dirk-linux-23] arp

Address HWtype HWaddress

foobar.cs.colorado.edu ether 00:00:F8:00:A3:F2

equium.cs.colorado.edu ether 00:A0:C9:49:22:F4

itsydev.cs.colorado.edu ether 00:A0:CC:50:BD:00

cs-gw3-esl.cs.colorado. ether 00:E0:F7:94:05:80

dirk-vmware.cs.colorado ether 00:A0:CC:50:C4:8A

Example ARP exchange

11:07:54.537688 0:0:f8:0:a3:f2 ff:ff:ff:ff:ff:ff arp 42: arp who-has ragtop.cs.colorado.edu tell foobar.cs.colorado.edu

11:07:54.538665 0:0:f8:75:5b:8c 0:0:f8:0:a3:f2 arp 60: arp reply ragtop.cs.colorado.edu is-at 0:0:f8:75:5b:8c

Sender MAC

Destination MAC

Proxy ARP

ARP Packets reply to the “Sender Hardware Address” and cache the “Target Hardware Address”

This can be different than the Ethernet Source Address of the reply!

Thus, host A can “reply for” host B, and all IP packets destined for B will be sent to A

Host A can then insure they get to host B

Using Proxy ARP to “Bridge” Networks

bsdi.13.35

sun.13.33

Enet, 140.252.13

.1.29

netb

.1.183

gateway

.1.4

Enet, 140.252.1

192.252.1.183

Using ARP to spot configuration problems

At boot-up, many systems issue an ARP request for their own IP address.

If anyone responds, something is mis-configured

You can also use “gratuitous ARP” for rapid fail-over. Everyone (usually) snoops the sending hardware address Server A & B have same internal IP address, but A is dormant Server A listens for a “death song” from Server B Server A immediately send an ARP request Everyone now thinks that A “is” the specified IP address

RARP

RARP is a reverse ARP request A host knows its MAC address, but not the specified IP address Broadcasts an RARP “who-is” request An RARP server looks up the MAC address in a table

(/etc/ethers) and replies with the IP address

DHCP protocol provides same functionality, better management

ICMP - Internet Control Message Protocol

Communicates error and exceptional conditions Some ICMP messages cause errors to be returned to the

use process

IPHeader

(contents depend on type & code)

8-bit type

8-bitcode

Checksum of ICMP

ICMP Types

Type Description0 Echo Reply3 Destination Unreachable4 Source Quench (simple flow control)5 Redirect (chapter 9.5)8 Echo Request (ping, chapter 7)9 Router advertisement

10 Router solicitation11 Time exceeded12 Parameter Problem13 Timestamp Request14 Timestamp Reply15 Information Request16 Information Reply17 Address mask Request18 Address mask Reply

Error Reporting

ICMP never returns errors (e.g. “destination unreachable”) for…

ICMP error messages A datagram destined for an IP broadcast address A datagram sent as a link-layer broadcast A fragment other than the first A datagram whose source address does not define a single

host (zero, loopback, broadcast or multicast)

Avoids “broadcast storm” Implies that protocols must be able to deal with dropped

ICMP packets

ICMP Destination Unreachable Codes

Code Description0 Network unreachable1 Host unreachable2 Protocol unreachable (e.g., UDP not provided)3 Port unreachable4 Fragmentation needed, but don’t fragment set5 Source route failed8 Destination network unknown9 Source host isolated

10 Destination network administratively prohibited11 Destination host administratively prohibited12 Network unreachable for TOS (9.3)13 Host unreachable for TOS14 Communication administratively prohibited by filtering15 Host precedence violation16 Precedence cutoff in effect

UDP Protocol

IPHeader

UDPHeader

UDPData

IP Datagram

UDP Datagram

20 bytes 8 bytes

UDP Header

16-bit Destination Port #16-bit Source Port #

Data (if any)

16-bit UDP Checksum (opt)16-bit UDP Length


Data (if any)

16-bit UDP Checksum(opt)16-bit UDP Length

UDP Header

IPHeader

UDPHeader

UDPData


Data (if any)

16-bit UDP Checksum(opt)16-bit UDP Length

UDP Checksum

IPHeader

UDPHeader

UDPData

IPPesudo-Header

IP Pseudo-Header


Data (if any)

16-bit UDP Checksum (opt)16-bit UDP Length

MBZ

32-bit Source IP address

32-bit Destination IP address

Protocol 16-bit UDP Length

Possible odd byte PAD

UDP Checksum

Checksum calculated like IP checksum, but use pseudo-IP header to insure packet arrived at proper host

If transmitted checksum field is zero, it means sender didn’t compute the checksum.

If the computed checksum would be zero, it’s represented as 65535

No packets with checksum errors are not reported

IP Fragmentation

When a router transits a packet that is too large for the MTU of the outgoing link, the packet is fragmented

Fragmented packets are not reassembled until they reach their final destination

Fragments may also be fragmented Fragments are identified using the datagram sequence # Typically, if any fragment is lost, a router will discard all

fragments. Routers usually only discover fragment loss if they drop the fragment themselves.

The endpoint assumes fragments are lost after 30-60 seconds

Packets vs. Datagrams

An IP datagram is the unit of end-to-end transmission at the IP layer (before fragmentation & after reassembly)

A packet is the unit of data passed between the IP layer and the link layer.

A packet can be a complete IP datagram or a fragment

IP Fragmentation

PayloadIP

Header

Payload

Payload

Payload

IPHeader

IPHeader

IPHeader

More Fragementsis Set

More Fragementsis NOT Set

IP Fragmentation - Identifying Information

PayloadIP

Header

PayloadIP

Header

Ver HdrLth Type of Svc Total length (in bytes)



Source IP Address



IP Fragmentation

PayloadIP

Header

Ver HdrLth Type of Svc Total length (in bytes)



Source IP Address



PayloadIP

Header

IP Fragmentation Of Non-Final Fragments

PayloadIP

Header

Payload

Payload

IPHeader

IPHeader

More Fragementsis Set

IPHeader

Payload

PayloadIP

Header

IP Fragmentation Of Final Fragment

PayloadIP

Header

Payload

Payload

Payload

IPHeader

IPHeader

IPHeaderMore Fragements

is Set

More Fragementsis NOT Set

IPHeader

IPHeader

Don’t Fragment

Hosts must be able to receive packets of 576 bytes, which means a 512-byte datagram won’t be fragmented

One of the IPv4 header flags specifies that this packet should not be fragmented

16-bit Packet Identification Fragment Offset

Reserved

Don’t Fragment

MoreFragments

ICMP Unreachable Error

Attempting to fragment a fragment with don’t fragment set generates an ICMP error packet

ICMP type “destination unreachable” (type 3) code “fragmentation required but don’t fragment set” (code 4)

MTU of next network hopMBZ

IP Header (including options)and first 8 bytes of original IP datagram data

Type (3) Code (4) Checksum

MTU Discovery UsingDon’t Fragment Packets

A

B C

D

N2

MTU = 875

N1

MTU = 1500

N3

MTU = 770

Packet Size = 1500

ICMP Next = 875

Packet Size = 875

ICMP Next = 770

Packet Size = 770

ICMP Source Quench

If a router / host discards datagrams due to buffer overflows, it may send a ICMP source quench message

I tried for 15 minutes to generate this on a slow host & was unable to do so

More likely to occur when e.g., routing to a dialup, but even that failed.

Can be used by a protocol to slow down transmission rate (e.g., TCP)

UDP Pragmatics (review from code)

UDP port and TCP ports are separate name spaces UDP port 80 doesn’t mean the same thing as TCP port 80

UDP ports are unique to a specific interface port 80 on loopback is not the same as port 80 on eth0

Most POSIX/UNIX systems let you specify “wildcards” IPADDR_ANY is a special address (0.0.0.0) that is a wild card

interface address

Using netstat to see ports

[current-45] netstat -n -aActive Internet connections (including servers)Proto Recv-Q Send-Q Local Address Foreign Address Statetcp 0 0 128.138.202.92:22 128.138.241.121:813 ESTABLISHEDtcp 0 0 0.0.0.0:6000 0.0.0.0:* LISTENtcp 0 0 0.0.0.0:22 0.0.0.0:* LISTENtcp 0 0 0.0.0.0:1024 0.0.0.0:* LISTENtcp 0 0 0.0.0.0:758 0.0.0.0:* LISTENtcp 0 0 0.0.0.0:25 0.0.0.0:* LISTENtcp 0 0 0.0.0.0:113 0.0.0.0:* LISTENtcp 0 0 0.0.0.0:79 0.0.0.0:* LISTENtcp 0 0 0.0.0.0:512 0.0.0.0:* LISTENtcp 0 0 0.0.0.0:513 0.0.0.0:* LISTENtcp 0 0 0.0.0.0:514 0.0.0.0:* LISTENtcp 0 0 0.0.0.0:23 0.0.0.0:* LISTENtcp 0 0 0.0.0.0:21 0.0.0.0:* LISTENtcp 0 0 0.0.0.0:37 0.0.0.0:* LISTENtcp 0 0 0.0.0.0:13 0.0.0.0:* LISTENtcp 0 0 0.0.0.0:111 0.0.0.0:* LISTENudp 0 0 0.0.0.0:8000 0.0.0.0:*udp 0 0 0.0.0.0:768 0.0.0.0:*udp 0 0 0.0.0.0:770 0.0.0.0:*udp 0 0 0.0.0.0:177 0.0.0.0:*

Using netstat to see interfaces

[current-45] netstat -n -a….

udp 0 0 128.138.202.92:8000 0.0.0.0:*udp 0 0 127.0.0.1:8000 0.0.0.0:*udp 0 0 0.0.0.0:769 0.0.0.0:*udp 0 0 0.0.0.0:768 0.0.0.0:*

System Calls Used

Socket Create an endpoint on the local system

Bind Specify the local interface and port for the endpoint

Connection Specify the remote interface and port for the endpoint

setsockopt / getsockopt Modify various default properties

Bound & Connected Sockets

Until bind is called, a socket is not bound Can’t receive messages (haven’t specified port) When you send using an unbound socket, it’s bound to an

ephemeral port

Until connect is called, a socket is not connected Sending messages on an unconnected socket requires that you

specify the destination address each time. If you do call connect, you can only receive messages on the

connected socket from that the specified remote endpoint

POSIX socket interface

send Send a message on a connected socket

sendto Send a datagram to a specified IP address. The socket can be

unconnected.

recv Receive a datagram from a bound socket

recvfrom Receive a datagram and record the source IP address

recvmsg Essentially like recvfrom, but arguments packed in a struct

One Last POSIX call - select

Select lets you wait on multiple file descriptors to become available, or for a timeout to occur

#include <sys/time.h>

int select(

int nfds,

fd_set *readfds,

fd_set *writefds,

fd_set *exceptfds,

struct timeval *timeout) ;

Common UDP “Server” Pattern

socket

setsockopt

bind

recvfrom

sendto

Common UDP “Client” Pattern

socket

setsockopt

sendto

recvfrom

Using Broadcast

UDP broadcast involves sending to explicit broadcast addresses

Most POSIX implementations require you explicitly enable broadcast

ret = setsockopt(sockfd, SOL_SOCKET,SO_BROADCAST, &on,

sizeof(on));

Only applicable to UDP!

Broadcast Addresses

Limited Net Broadcast - 255.255.255.255 Never forwarded by a router!

Net-directed Broadcast - e.g., 128.138.255.255 A router must forward a net-directed broadcast, but must have

an option to disable this.

Subnet-directed Broadcast - e.g., 128.138.202.255

Multicast

Class D addresses are multicast addresses 224.0.0.0 through 239.255.255.255

A specific multicast address defines a “network group”

Two special network groups 224.0.0.xxx is never routed 224.0.0.1 - “all hosts group” 224.0.0.2 - “all routers group”

Well-defined multicast groups

ntp.mcat.net is 224.0.1.1 Network time protocol

// // Set socket option to to joint mcast // { struct ip_mreq mreq; memcpy(&mreq.imr_multiaddr, &from_addr.sin_addr.s_addr, sizeof(struct in_addr));

mreq.imr_interface.s_addr = htonl(INADDR_ANY);

ret = setsockopt(sockfd, IPPROTO_IP, IP_ADD_MEMBERSHIP, &mreq, sizeof(mreq)); check_and_exit(ret, "setsockopt"); }

Joining a Multicast Group prior to receive

Desired multicast group

Using TTL to define multicast “scope”

TTL field is used to limit propogation of multicast packets In IPv4

0 - “node local” - doesn’t leave machine 1 - “link local” - doesn’t get routed <32 - “site local” - ….But what’s a site?… <255 - “global” - The world

// // // Set socket option to to joint mcast // { u_char ttl = 16;

ret = setsockopt(sockfd, IPPROTO_IP, IP_MULTICAST_TTL, &ttl, sizeof(ttl)); check_and_exit(ret, "setsockopt"); }

Setting a TTL scope

Desired TTL field

Administrative Scoping

239.xxx.yyy.zzz is the administratively scope multicast IP space

Addresses assigned locally to an organization, but not unique across organizations

Border routers must not forward

link-local -- 224.0.0.0 to 224.0.0.255 site-local -- 239.255.0.0 to 239.255.255.255 organization-local - 239.192.0.0 to 239.195.255.255 global -- 224.0.1.0 to 238.255.255.255

Converting Multicast to Ethernet

Multicast addresses are targeted to a number of clients How does the ethernet card know which messages to receive?

Could simple broadcast all packets Takes the same amount of network bandwidth as selective

multicast, but.. Disturbs all machines

Can use ARP to advertise single MAC as resolving multiple IP addresses

..But multiple machines want to receive

Ethernet cards can usually receive on multiple MAC addresses

Multicast router enters a “virtual” MAC address, clients receive on that virtual MAC

1110 0000

00101 11100000 00000000 0001

Mapping a Multicast Address toEthernet Address

Digression - Virtual IP Addresses

alias alias_address[/bitmask] Establishes an additional network address for this interface. Eample:

ifconfig eth0 alias 128.138.241.79/26

The following aliaslist command adds network addresses 40 through 50, inclusive, to subnets

18.240.32, 18.240.64, and 18.240.96

ifconfig aliaslist 18.240.32,64,96.40-50

Doesn’t require multiple MAC addresses, but often implemented using them.

IGMP - Internet Group Management Protocol

Part of IP layer Lets hosts & routers know who belongs to what groups

IP Header(20 bytes)

IGMPMessage (8 bytes)

IGMP Message Format

Version(1) Type(1-2) MBZ 16-bit checksum

32-bit group address (Class D IP address)

Type 1 is a query sent by multicast router 2 is a reponse sent by a host

Group address is a class D IP address On query, it’s zero On response, it’s the group address being reported

IGMP Host Reports

Host Router

Host sends a report when it joins a group Doesn’t report when it leaves the group, but doesn’t respond to

next query

IGMP report, TTL =1IGMP group addr = group address

dest IP addr = group addresssrc IP addr = hosts IP address

IGMP Router Query

Host Router

Router sends query at regular intervals to see if anyone still belongs to any groups. Queries sent out each interface.

Host responds by sending one responsefor each group to which it belongs IGMP query, TTL =1

IGMP group addr = 0.0.0.0dest IP addr = 224.0.0.1

src IP addr = routers IP address

Sample Query on Windows Bootup

05:52:32.517937 arp who-has 192.168.1.6 tell 192.168.1.6

05:52:32.518010 linux > 192.168.1.6: icmp: echo request

05:52:33.378928 192.168.1.6 > ALL-ROUTERS.MCAST.NET: icmp: router solicitation

05:52:37.511385 arp who-has 192.168.1.6 tell linux

05:52:37.511664 arp reply 192.168.1.6 is-at 0:a0:cc:3b:95:4b



Multicast Routes

Sender Sends DatagramWith Specified TTL

Pruned because no

one is listening

Receiver Starts Joins group

Routers Form Destination Tree

Non-participants prune themselves

TCP Protocol

TCP Protocol

IPHeader

TCPHeader

TCPData

20 bytes 20 bytes Variable size

TCP Segment

IP Datagram

The TCP Protocol


Data (if any)

16-bit TCP Checksum

Options (if any)

32-bit sequence number

32-bit acknowledgment number

16-bit window sizeflagsreservedHeader lth

16-bit Urgent Pointer

The TCP Flags

URG

ACK

PSH

RST

SYN

FIN

URG - urgent pointer is valid (Stevens, 20.8) ACK - the acknowledgment number is valid PSH - The reciever should pass this data to the application as soon

as possible (“push”) RST - reset the connection (Stevens, 18.7) SYN - synchronize the sequence numbers to initiate a connection FIN - sender is finished sending data

TCP Header

The combination of an IP address and a port number is called a socket in RFC 793.

A socket pair specifies a TCP connection The sequence number is used to number the starting byte

of each segment. The byte sequences wrap around after 32 bits of bytes have been sent.

When the SYN flag is set, the sequence number contains the initial sequence number (ISN).

The acknowledgment number contains the next sequence number the sender of the ACK expects to receive

Acknowledgments

TCP uses a sliding window protocol without selective or negative acknowledgments.

Selective acknowledgments would let the protocol say it’s missing a range of bytes. TCP can only say that it has received “up to byte N”.

The protocol has no way to specify a negative acknowledgment. It can only say what has been received

0

Next ACK

Other Header Fields

TCP’s flow control is limited by a window size, which represents space allocated by the O/S for the connection. The sender should not transmit more data than the window size can hold.

When a connection is started, the acknowledgment field specifies the window size. Can be <=65535 bytes, but this value can be “scaled” to allow larger sizes (Stevens 24.4)

The urgent pointer is a positive offset that must be added to the sequence number of the segment to yield the sequence number of the last byte of urgent data. This is used to send “emergency data” to the other end (20.8)

Connection ProtocolThree-way handshake

The client sends a SYN segment specifying the port number of the server and the clients ISN

The server responds with a SYN and ISN. Server ACK’s the client SYN using client ISN+1.

A SYN consumes one sequence number

The client must ACK the SYN from the server using the server ISN+1

The side sending the first SYN is said to perform an active open. The other side performs a passive open.

Connection Timeline

Termination

Four segments to terminate a half-close. Receipt of a FIN only means no more data will flow in that

particular direction. The other direction may still be active.

The FIN sender performs an active close, the other performs a passive close.

When the server receives a FIN, it sends an ACK of the received sequence number plus one (segment 5)

O/S delivers end of file to application Server then closes it’s connection, causing a FIN to the client

which the client ACK’s using the sequence number + 1

Normal Termination

TCP State Transition Diagram

Normal TCP/IP Connection & Termination

You can use “netstat” tosee connection state

Proto Recv-Q Send-Q Local Address Foreign Address State

tcp 1 0 dialup-85-157.Col:32779 a216-200-14-151.dep:www CLOSE_WAIT

tcp 1 0 dialup-85-157.Col:32780 a216-200-14-151.dep:www CLOSE_WAIT

tcp 0 136 linux:telnet grok:1050 ESTABLISHED

tcp 0 0 dialup-85-157.Colo:1023 foobar.cs.Colorado.:ssh ESTABLISHED

udp 0 0 localhost:32856 localhost:domain ESTABLISHED

The 2-MSL Wait State

Every implementation chooses a value for the maximum segment lifetime -- the maximum time any segment can existing in the network before being discarded

Specified in RFC 793 as 2 minutes, but common values are 1 or 2 minutes & 30 seconds

When TCP performs an ACTIVE CLOSE and sends the final ACK, that connection must stay in the TIME_WAIT state for 2*MSL.

This lets TCP re-send final ACK if it’s lost --- if all connection information was gone, it couldn’t retransmit

The 2-MSL Wait State

While a connection is in 2MSL, the socket part can not be re-used

Most BSD systems also insist that the port can’t be re-used while that (local) port number is in a 2MSL state

Use SO_REUSEADDR to over-ride that

Normally, the CLIENT does the active close and the CLIENT enters the TIME_WAIT state

2MSL wait not typically an issue since the CLIENT usually picks an ephemeral port, so no one cares if it can’t be reused

TIME-WAIT for servers

The TIME_WAIT / 2MSL state causes problems for servers An active close on a well-known port means the server

can’t be restarted for 2-4 minutes, depending on MSL.

Allegedly true even if SO_REUSEADDR is specified But, BSD systems allow a new connection to be

established if the ISN is larger than that of the final sequence number of the previous connection

TCP Options

End-of-option list No-op (used to align options on 32-bit boundaries) MSS Window scale factor Timestamp

Timestamp value & echo reply

Server Design

Restricting local IP addresses Same rules as UDP

Restricting foreign IP addresses Most API’s don’t support a “connect” on the server to allow it to

fully specify the remote end-point.

Server specifies incoming connection request queue Backlog of active connections

Overview of Mobile IP

IPv6 Design Goals

IPv4 was very successful, but the limited addresses pose problems

Experience had shown that aspects of IPv4 were problematic: option headers, fragments

Simplifications for IPv6 Move to 128-bite addresses Assign a fixed format to all headers Remove the header checksum Use “extension headers” rather than options Remove the hop-by-hop segmentation procedure

IPv4 Header

Version Hdr Lth Type of Svc Total length (in bytes)



Source IP Address



Data

IPv6 Header

Version Flow Label

Payload Length Next Header Hop Limit

Class

Source Address

Destination Address

IPv6 Header

Version -- 6 Class -- used for to assign service class for real time

networking Flow -- used to identify packets that are in a “flow”, or

which should the same routing behavior at intermediate points (not a virtual circuit identifier or specifier!)

Payload Length -- Only include payload (not 20 byte header) 16 bit, Packets < 64K

Next Header -- the type of the next header (e.g, TCP, UDP or one of the extension headers)

Hop limit -- TTL renamed for honesty

(non) Coexistence

The original intent was to have IPv4 and IPv6 deployed concurrently over the same network fabric

That idea has been pitched. IPv6 has been assigned an Ethernet Content Type of 0x86DD

vs. the 0x8000 for IPv4

The “6BONE” provides a virtual IPv6 network using IPv4 encapsulation akin to “MBONE”.

Fragments

Lesson: Unit of transmission should be unit of control

No fragments created enroute in IPv6

If message > MTU, you get ICMP message and should use PMTU

However, there is a way to fragment a datagram, but it’s done in an “end-to-end” fashion.

From Options To Extension Headers

IPv6 Header

Next Header = TCP

TCP Header & Payload

IPv6 Header

Next Header =Routing

Routing Header

Next Header = TCP

TCP Header& Payload

Extension Headers

Goal: Intermediate routers don’t need to look at the headers. Unless we tell them to.

Extension Headers & Protocols (e.g. TCP) share the same 256-entry name space, so limited number of extensions

Current IPv6 Extension Headers Routing Header Fragment Header Destination Options Header Hop-by-Hop Options Header Authentication Header Encrypted Security payload

Routing Extension Header

Next Header Hdr Ext Len Routing Type=0 Segments Left

Reserved

Address[1]

Address[n]

...

Routing Extension Header

Plays same role as source routing header Basic idea:

When a datagram reaches a destination, the destination checks for a routing header. If there is at least one segment left, that address is copied from the routing header and the packet is forwarded to that address.

Otherwise, the routing header is removed and the next routing header is processed.

You can have multiple routing headers if the 8-bit header length causes a problem.

You can specify other source routing modes using “type”

Fragment Header

Next Header Reserved Fragment Offset (13 bits) M

Identification

RES

Each fragment routed independently “identification” identifies the original packet that was fragmented The “offset” is the offset within the fragment The “M” field is a “more fragments” bit and is set to 1 for all but

last fragment

Destination Options Header

When a packet reaches its final destination (or at least when all prior routing extensions have been processed), the destination options header is processed

Unknown options are (optionally) discarded

Next Header Reserved

Options

Options

Option Type Opt Data Len Option Data

Option TypeA C00 - Skip01 - Discard, no ICMP10 - Discard, send ICMP11 - Discard, send ICMP if not mcast

Change enroute

Hop-by-Hop Options Header

Hop-by-hop options are processed at each hop

Example: “Jumbo payload header”. IP header length is zero and the jumbo option encodes the true length as a 32-bit value

Also used to mark spanning trees for multicast and realtime protocols, where information needs to be “deposited” on each intermediate router

Next Header Reserved

Options

Options

Extension Header Order

Extension headers are removed & processed like an “onion peel”

Suggested order IPv6 Header Hop-by-Hop Destination options header (1) Routing Header Fragment Header Authentication Header Destination Options Header (2) Upper-layer header (e.g. TCP or UDP)

Peeling Extension Headers

IPHeader

RoutingHeader

AuthHeader

RoutingHeader

RoutingHeader

TCP Payload

IPHeader

AuthHeader

RoutingHeader

RoutingHeader

TCP Payload

IPHeader

AuthHeader

RoutingHeader

TCP Payload

IPHeader

RoutingHeader

TCP Payload

IPHeader

TCP Payload

Naming - Aggregatable GlobalUnicast Addresses

Move away from provider-based to routing based ID’s Top Level Aggregation -- essentially a hierarchical organization

reflecting the current internet architecture Next Level Aggregator Site Level Aggregator -- allocated to a link within a site The interface ID is based on EUI-ID (an extension of the ethernet

MAC address)

001 TLA(13) NLA(32) SLA(16) Interface ID

Other Address

Unspecified addresses - 16 null bytes Loopback 0:0:0:0:0:0:0:1 Site local

Last 80 bits same as the normal address, but specified independently of the TLA/SLA

Link local Multicast Anycast

Security Associations

Authentication & encryption requires that senders and receivers agree on

A key An authentication or encryption algorithm Set of ancillary parameters such as the lifetime of the key or

details about the algorithm

This is a security association

Authentication Headers

The SPI is selected by the receiver and is used to describe the security association & normally negotiated during the key exchange

Next Header Len Reserved

Security Parameters Index

Sequence Number Field

Authentication Data (variable)

Encrypted Security Payload

Last (unencrypted) header in the chain ESP header also includes authentication to prevent tampering with

the encrypted data

IPv6Ext

HeaderESP

HeaderEncrypted

DataAuthentication

Data

Encrypted

Key Distribution

SKIP - like Diffie-Hellman, but each network entity must pick a static secret and publicize gj in a directory

The key between two hosts Kij = gij is static, which means you could crack it with enough time

SKIP only uses the static key in the key exchange phase, and then combines it with a time-varying field. The resulting key is used to encrypt the actual session key

ISAKMP-OAKLEY Internet security association and key management protocol

The Rapid Fire Survey of IP / UDP / TCP

Documents

Transcript of The Rapid Fire Survey of IP / UDP / TCP