Internet Protocols Fall 2005
Transcript of Internet Protocols Fall 2005
Internet ProtocolsFall 2005
Lectures 7-8Andreas Terzis
Outline
CS 349/Fall05 2
• Internet Protocol– Service Model– Fragmentation– Addressing
• Original addressing scheme• Subnetting• CIDR
– Forwarding– ICMP– ARP– Address Shortage
• NAT• IPv6
IP Internet
CS 349/Fall05 3
• Concatenation of Networks
• Protocol Stack
R2
R1
H4
H5
H3H2H1
Network 2 (Ethernet)
Network 1 (Ethernet)
H6
Network 4(point-to-point)
H7 R3 H8
Network 3 (FDDI)
R1 R2 R3
H1 H8
ETH FDDI
IP
ETH
TCP
FDDI PPP PPP ETH
IP
ETH
TCP
IP IP IP
Service Model
CS 349/Fall05 4
• Connectionless (datagram-based)• Best-effort delivery (unreliable service)
– packets are lost– packets are delivered out of order– duplicate copies of a packet are delivered– packets can be delayed for a long time
• Datagram format
Version HLen TOS Length
Ident Flags Offset
TTL Protocol Checksum
SourceAddr
DestinationAddr
Options (variable) Pad(variable)
0 4 8 16 19 31
Data
App
Transport
Network
TCP / UDP
IP
Data Hdr
Data
TCP Segment
IP DatagramHdr
Link
Fragmentation
CS 349/Fall05 5
Problem: A router may receive a packet larger than the maximum transmission unit (MTU) of the outgoing link.
Source Destination
AMTU=1500 bytes
BMTU=1500 bytesEthernet
MTU<1500 bytesR1 R2
Solution: R1 fragments the IP datagram into mutiple, self-contained datagrams.
Data HDR (ID=x)
Data HDR (ID=x) Data HDR (ID=x) Data HDR (ID=x)
Offset>0More Frag=0
Offset=0More Frag=1
Fragmentation (II)
CS 349/Fall05 6
• Fragments are re-assembled by the destination host; not by intermediate routers.
• To avoid fragmentation, hosts commonly use path MTU discovery to find the smallest MTU along the path.
• Path MTU discovery involves sending various size datagramsuntil they do not require fragmentation along the path.
• Most links use MTU>=1500bytes today. • Try: traceroute –f www.mit.edu 1500 and
traceroute –f www.mit.edu 1501• (DF=1 set in IP header; routers send “ICMP” error message,
which is shown as “!F”).
Global Addresses
CS 349/Fall05 7
• Properties– globally unique– hierarchical: network + host
• Dot Notation– 10.3.2.4– 128.96.33.81– 192.12.69.77
Network Host
7 24
0(a)
Network Host
14 16
1 0(b)
Network Host
21 8
1 1 0(c)
Datagram Forwarding
CS 349/Fall05 8
• Strategy– every datagram contains destination’s address– if connected to destination network, then forward
to host– if not directly connected, then forward to some
router– forwarding table maps network number into next
hop– each host has a default router– each router maintains a forwarding table
• Example (R2)
Network Number Next Hop
1 R3
2 R1
3 interface 1
4 interface 0
R2
R1
H4
H5
H3H2H1
Network 2 (Ethernet)
Network 1 (Ethernet)
H6
Network 4(point-to-point)
H7 R3 H8
Network 3 (FDDI)
IP Addressing
CS 349/Fall05 9
• Problem:– Address classes were too “rigid”. For most organizations,
Class C were too small and Class B too big. Led to very inefficient use of address space, and a shortage of addresses.
– Organizations with internal routers needed to have a separate (Class C) network ID for each link.
– And then every other router in the Internet had to know about every network ID in every organization, which led to large address tables.
– Small organizations wanted Class B in case they grew to more than 255 hosts. But there were only about 16,000 Class B network IDs.
IP Addressing
CS 349/Fall05 10
• Two solutions were introduced:• Subnetting is used within an organization to
subdivide the organization’s network ID.• Classless Interdomain Routing (CIDR) was
introduced in 1993 to provide more efficient and flexible use of IP address space across the whole Internet.
• CIDR is also known as “supernetting”because subnetting and CIDR are basically the same idea.
Subnetting
CS 349/Fall05 11
• Add another level to address/routing hierarchy: subnet
• Subnet masks define variable partition of host part• Subnets visible only within site
Network number Host number
Class B address
Subnet mask (255.255.255.0)
Subnetted address
111111111111111111111111 00000000
Network number Host IDSubnet ID
Subnet Example
CS 349/Fall05 12
Forwarding table at router R1Subnet Number Subnet Mask Next Hop128.96.34.0 255.255.255.128 interface 0128.96.34.128 255.255.255.128 interface 1128.96.33.0 255.255.255.0 R2
Subnet mask: 255.255.255.128Subnet number: 128.96.34.0
128.96.34.15128.96.34.1
H1 R1
128.96.34.130 Subnet mask: 255.255.255.128Subnet number: 128.96.34.128
128.96.34.129128.96.34.139
R2H2
128.96.33.1128.96.33.14
Subnet mask: 255.255.255.0Subnet number: 128.96.33.0
H3
Forwarding Algorithm
CS 349/Fall05 13
D = destination IP addressfor each entry (SubnetNum, SubnetMask, NextHop)
D1 = SubnetMask & Dif D1 = SubnetNum
if NextHop is an interfacedeliver datagram directly to D
elsedeliver datagram to NextHop
• Use a default router if nothing matches• Not necessary for all 1s in subnet mask to be contiguous • Can put multiple subnets on one physical network• Subnets not visible from the rest of the Internet
Classless Interdomain Routing (CIDR)Addressing
CS 349/Fall05 14
The IP address space is broken into line segments.Each line segment is described by a prefix.A prefix is of the form x/y where x indicates the prefix of all addresses in the line segment, and y indicates the length of the segment.e.g. The prefix 128.9/16 represents the line segment containing addresses in the range: 128.9.0.0 … 128.9.255.255.
0 232-1
128.9/16
128.9.0.0
216
142.12/1965/8
128.9.16.14
CS 349/Fall05 15
Classless Interdomain Routing (CIDR)Addressing
0 232-1
128.9/16
128.9.16.14
128.9.16/20 128.9.176/20
128.9.19/24
128.9.25/24
Most specific route = “longest matching prefix”
Classless Interdomain Routing (CIDR)Addressing
CS 349/Fall05 16
Prefix aggregation:If a service provider serves two organizations with prefixes, itcan (sometimes) aggregate them to form a larger prefix. Other routers can refer to this larger prefix, and so reduce the size of their address table.E.g. ISP serves 128.9.14.0/24 and 128.9.15.0/24, it can tell other routers to send it all packets belonging to the prefix 128.9.14.0/23.
ISP Choice:In principle, an organization can keep its prefix if it changes service providers.
Hierarchical addressing: route aggregation
CS 349/Fall05 17
Hierarchical addressing allows efficient advertisement of routing information:
“Send me anythingwith addresses beginning 200.23.16.0/20”
200.23.16.0/23
200.23.18.0/23
200.23.30.0/23
Fly-By-Night-ISP
Organization 0
Organization 7Internet
Organization 1
ISPs-R-Us “Send me anythingwith addresses beginning 199.31.0.0/16”
200.23.20.0/23Organization 2
...
...
Hierarchical addressing: route aggregation
CS 349/Fall05 18
200.23.16.0/23
200.23.18.0/23
200.23.30.0/23
Fly-By-Night-ISP
Organization 0
Organization 7Internet
Organization 1
ISPs-R-Us “Send me anything with addresses beginning 199.31.0.0/16”
200.23.20.0/23Organization 2
...
...
“Send me anything with addresses beginning 200.23.16.0/20”
Multi-homing“Send me anything with addresses beginning 199.31.0.0/16, or200.23.30.0/23 ”
CS 349/Fall05 19
Size of the Routing Table at the core of the Internet
Source: http://bgp.potaroo.net/
Prefix Length Distribution
CS 349/Fall05 20
0
10000
20000
30000
40000
50000
60000
70000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Prefix Length
Num
ber
of P
refixe
s
Source: Geoff Huston, Oct 2001
How a Router Forwards Datagrams
CS 349/Fall05 21
e.g. 128.9.16.14 => Port 2
R1
R2
R3
R4
12
3
128.17.20.1
128.17.16.1
128.9/16128.9.16/20
128.9.176/20
128.9.19/24128.9.25/24
142.12/19
65/8
Prefix Port3227213
128.17.14.1128.17.14.1
128.17.20.1
128.17.10.1128.17.14.1
128.17.16.1
128.17.16.1
Next-hop
Forwarding table
Forwarding in an IP Router
CS 349/Fall05 22
• Lookup packet DA in forwarding table.– If known, forward to correct port.– If unknown, drop packet.
• Decrement TTL, update header Checksum.• Forward packet to outgoing interface.• Transmit packet onto link.
Question: How is the address looked up in a real router?
Making a Forwarding DecisionClass-based addressing
CS 349/Fall05 23
IP Address Space
Class A Class B Class C D
Class A
Class B
Class C212.17.9.0 Port 4
ExactMatch
Routing Table:212.17.9.4
212.17.9.0
Exact Match: There are many well-known ways to find an exact match in a table.
Lookup Performance Required
CS 349/Fall05 24
Line Line Rate Pkt-size=40B Pkt-size=240B
T1 1.5Mbps 4.68 Kpps 0.78 Kpps
OC3 155Mbps 480 Kpps 80 Kpps
OC12 622Mbps 1.94 Mpps 323 Kpps
OC48 2.5Gbps 7.81 Mpps 1.3 Mpps
OC192 10 Gbps 31.25 Mpps 5.21 Mpps
Direct Lookup
CS 349/Fall05 25
IP Address
Address
MemoryD
ataNext-hop, Port
Problem: With 232 addresses, the memory would require 4 billion entries.
Associative Lookups“Contents addressable memory” (CAM)
CS 349/Fall05 26
Advantages:• Simple
Disadvantages• Slow• High Power• Small• Expensive
AssociativeMemory or CAM
NetworkAddress
PortNumber
PortNumberSearch
Data
32Hit?
Hashed Lookups
CS 349/Fall05 27
HashingFunction
Memory
Addr
ess
Dat
a
Search Data
log2N
AssociatedData
Hit?
Address{1632
Lookups Using HashingAn example
CS 349/Fall05 28
Hashing Function16
#1 #2 #3 #4
#1 #2
#1 #2 #3Linked list of entrieswith same hash key.
Memory
Search Data
AssociatedData
Hit?32
Lookups Using Hashing
CS 349/Fall05 29
Advantages:• Simple
• Expected lookup time can be small
Disadvantages• Non-deterministic lookup time
• Inefficient use of memory
Patricia Tries
CS 349/Fall05 30
Example Prefixes:a) 00001b) 00010c) 00011d) 001e) 0101f) 011g) 100h) 1010i) 1100j) 11110000e
f g
h i
Skip 5j
0 1
a b c
d
How to Implement a router
CS 349/Fall05 31
• Issues– Performance
• Throughput
– Scaling
Workstation-Based
CS 349/Fall05 32
• Aggregate bandwidth– 1/2 of the I/O bus bandwidth – capacity shared among all hosts connected to router– example: 1Gbps bus can support 5 x 100Mbps ports (in theory)
• Packets-per-second– must be able to switch
small packets– 300,000 packets-per-
second is achievable– e.g., 64-byte packets
implies 155Mbps
I/O bus
Interface 1
Interface 2
Interface 3
CPU
Main memory
Switching Hardware
CS 349/Fall05 33
• Design Goals– throughput (depends on traffic model)– scalability (a function of n)
• Ports– buffering (input and/or output)
• Fabric– as simple as possible– sometimes do buffering (internal)
Switchfabric
Controlprocessor
Outputport
Inputport
LAN Addresses and ARP
CS 349/Fall05 34
32-bit IP address:• network-layer address• used to get datagram to destination IP network (recall IP
network definition)LAN (or MAC or physical or Ethernet) address: • used to get datagram from one interface to another
physically-connected interface (same network)• 48 bit MAC address (for most LANs)
“burned” in the adapter EPROM
LAN Address (more)
CS 349/Fall05 35
• MAC address allocation administered by IEEE• manufacturer buys portion of MAC address space (to assure
uniqueness)• Analogy:
(a) MAC address: like Social Security Number(b) IP address: like postal address
• MAC flat address => portability – can move LAN card from one LAN to another
• IP hierarchical address NOT portable– depends on IP network to which node is attached
ARP: Address Resolution Protocol
CS 349/Fall05 36
• Each IP node (Host, Router) on LAN has ARP table
• ARP Table: IP/MAC address mappings for some LAN nodes
< IP address; MAC address; TTL>
– TTL (Time To Live): time after which address mapping will be forgotten (typically 20 min)
Question: how to determineMAC address of Bknowing B’s IP address?
ARP protocol
CS 349/Fall05 37
• A wants to send datagram to B, and A knows B’s IP address.
• Suppose B’s MAC address is not in A’s ARP table.
• A broadcasts ARP query packet, containing B's IP address – all machines on LAN receive
ARP query• B receives ARP packet, replies
to A with its (B's) MAC address– frame sent to A’s MAC address
(unicast)
• A caches (saves) IP-to-MAC address pair in its ARP table until information becomes old (times out) – soft state: information
that times out (goes away) unless refreshed
• ARP is “plug-and-play”:– nodes create their ARP
tables without intervention from net administrator
ARP Protocol (II)
CS 349/Fall05 38
• Proxy ARP– Reply on behalf of another node
• ARP Highjacking– Node can “steal” packets destined to another node– Solutions
• Static ARP entries• Switched LANs• Lock MAC addresses to Switch ports
ICMP
CS 349/Fall05 39
• Internet Control Message Protocol:– Used by a router/end-host to report some types of error:– E.g. Destination Unreachable: packet can’t be forwarded
to/towards its destination. – E.g. Time Exceeded: TTL reached zero, or fragment didn’t
arrive in time. Traceroute uses this error to its advantage.– An ICMP message is an IP datagram, and is sent back to the
source of the packet that caused the error.
IP Address Shortage
CS 349/Fall05 40
• Global IPv4 addresses are getting depleted– Increase in number of hosts
• PCs, PDAs, cellphones, microwaves, etc– Address inefficiencies
• What to do– Get larger address space -> IPv6– Remove the assumption that address is globally unique
• Can reuse the same address multiple times -> NAT
NAT: Network Address Translation
CS 349/Fall05 41
10.0.0.1
10.0.0.2
10.0.0.3
10.0.0.4
138.76.29.7
local network(e.g., home network)
10.0.0/24
rest ofInternet
Datagrams with source or destination in this networkhave 10.0.0/24 address for
source, destination (as usual)
All datagrams leaving localnetwork have same single source NAT IP address: 138.76.29.7,different source port numbers
NAT: Network Address Translation
CS 349/Fall05 42
10.0.0.1
10.0.0.2
10.0.0.3
S: 10.0.0.1, 3345D: 128.119.40.186, 80
1
10.0.0.4
138.76.29.7
1: host 10.0.0.1 sends datagram to 128.119.40, 80
NAT translation tableWAN side addr LAN side addr138.76.29.7, 5001 10.0.0.1, 3345…… ……
S: 128.119.40.186, 80 D: 10.0.0.1, 3345 4
S: 138.76.29.7, 5001D: 128.119.40.186, 802
2: NAT routerchanges datagramsource addr from10.0.0.1, 3345 to138.76.29.7, 5001,updates table
S: 128.119.40.186, 80 D: 138.76.29.7, 5001 3
3: Reply arrivesdest. address:138.76.29.7, 5001
4: NAT routerchanges datagramdest addr from138.76.29.7, 5001 to 10.0.0.1, 3345
NAT implementation
CS 349/Fall05 43
• NAT router must:– outgoing datagrams: replace (source IP address, port #) of every
outgoing datagram to (NAT IP address, new port #)• . . . remote clients/servers will respond using (NAT IP address, new
port #) as destination addr.– remember (in NAT translation table) every (source IP address, port
#) to (NAT IP address, new port #) translation pair– incoming datagrams: replace (NAT IP address, new port #) in dest
fields of every incoming datagram with corresponding (source IP address, port #) stored in NAT table
NAT Problems
CS 349/Fall05 44
• Problems due to NAT– Increased network complexity, reduced
robustness– Cannot run services inside NAT (maybe)
• Address shortage should instead be solved by IPv6
IPv6
CS 349/Fall05 45
• Motivation: 32-bit address space exhaustion• Take the opportunity for some clean-up
• IPv6 datagram format:– fixed-length 40 byte header– Address length changed from 32 bits to 128 bits– fragmentation fields moved out of base header– IP options moved out of base header
• Header Length field eliminated– Header Checksum eliminated– Type of Service field eliminated– Time to Live Hop Limit, Protocol Next Header– Precedence Priority, added Flow Label field– Length field excludes IPv6 header
IPv6 header format
CS 349/Fall05 46
Destination Address (16 bytes)
Version Priority Flow LabelPayload Length Next Header Hop Limit
Source Address (16 bytes, 128 bits)
Version Hdr Len Total LengthIdentification Fragment Offset
Prec TOS
Time to Live Protocol Header ChecksumFlags
Source AddressDestination Address
PaddingOptions
32 bits
IPv4 header
Transition From IPv4 To IPv6
CS 349/Fall05 47
• Not all routers can be upgraded simultaneously
• Two proposed approaches to allow the Internet operate with mixed IPv4 and IPv6 routers :
Dual Stack: some routers with dual stack (v6, v4) can “translate”between formats
Tunneling: IPv6 carried as payload in IPv4 packets among IPv4 routers