School of Computing National University of Singaporetbma/teaching/cs5229y14... · 2014-10-02 ·...
Transcript of School of Computing National University of Singaporetbma/teaching/cs5229y14... · 2014-10-02 ·...
Richard T. B. Ma
School of Computing
National University of Singapore
Forwarding and Routing
CS 5229: Advanced Compute Networks
IP Protocol Stack: Key Abstractions
Best-effort local packet delivery
Best-effort global packet delivery
Reliable streams
Applications
Unreliable datagrams
Link
Network
Transport
Application
Datagram networks
no call setup at network layer
routers: no end-to-end state information no network-level concept of “connection”
packets forwarded using destination host address
1. send datagrams
application
transport
network
data link
physical
application
transport
network
data link
physical
2. receive datagrams
Two Key Network-Layer Functions
Forwarding (data plane): move packets from an input interface to an appropriate output interface in a router
Routing (control plane): determine route taken by packets from source to destination
routing algorithms
analogy:
routing: process of planning trip from source to destination
forwarding: process of getting through single interchange
Forwarding: Hop-by-Hop
Each router has a forwarding table maps destination addresses to outgoing
interfaces
Upon receiving a packet inspect the destination IP address in the
header index into the table
determine the outgoing interface
forward the packet out that interface
Then, the next router on the path repeats and the packet travels along the path to the
destination
Router Architecture Overview
two key router functions: run routing algorithms/protocol
forwarding datagrams from incoming to outgoing link
router input ports switching
fabric
routing
processor
line card
line card
line card
line card
router output ports
Data Plane
Control Plane
line
termination
link layer
protocol (receive)
lookup,
forwarding
queueing
Input port functions
decentralized switching: given datagram destination, lookup output
port using forwarding table in input port memory (“match plus action”)
goal: complete input port processing at “line speed”
queuing: if datagrams arrive faster than forwarding rate into switch fabric
physical layer: bit-level reception
data link layer: e.g., Ethernet
switch fabric
Switching fabrics
transfer packet from input buffer to appropriate output buffer
switching rate: rate at which packets can be transfer from inputs to outputs often measured as multiple of input/output line rate
N inputs: switching rate N times line rate desirable
three types of switching fabrics
memory
memory
bus crossbar
Switching via memory
first generation routers: traditional computers with switching under direct
control of CPU
packet copied to system’s memory
speed limited by memory bandwidth (2 bus crossings per datagram)
input port
(e.g., Ethernet)
memory
output port
(e.g., Ethernet)
system bus
Switching via a bus
datagram from input port memory to output port memory via a shared bus
internal label is used to indicate output port
bus contention: switching speed limited by bus bandwidth
32 Gbps bus, Cisco 5600: sufficient speed for access and enterprise routers
bus
Switching via interconnection nets
overcome bus bandwidth limitations
banyan networks, crossbar, other interconnection nets initially developed to connect processors in multiprocessor
advanced design: fragmenting datagram into fixed length cells, switch cells through the fabric.
Cisco 12000: switches 60 Gbps through the interconnection network
crossbar
Banyan networks
Output ports
buffering required when datagrams arrive from fabric faster than the transmission rate
scheduling discipline chooses among queued datagrams for transmission
line termination
link layer
protocol (send)
switch fabric
datagram
buffer
queueing
Output port queueing
buffering when arrival rate via switch exceeds output line speed
queueing (delay) and loss due to congestion and output port buffer overflow!
at t, packets more
from input to output
one packet time later
switch
fabric
switch
fabric
Input port queuing
fabric slower than input ports combined -> queueing may occur at input queues
delay and loss due to input buffer overflow! Head-of-the-Line (HOL) blocking: queued datagram
at front of queue prevents others in queue from moving forward
output port contention:
only one red packet can be transferred
(lower red packet is blocked)
switch
fabric
one packet time later:
green packet experiences
HOL blocking
switch
fabric
Routing Processor
“Control-Plane” software implementation of routing protocols
creation of forwarding table for the line cards
Handling of special data packets packets with IP options enabled
packets with expired Time-to-Live
Network management functions command-line interface (CLI) for configuration
transmission of measurement statistics
switching fabric
routing
processor
line card
line card
line card
line card
Routing vs. Forwarding
Routing: control plane Computing paths the packets will follow
Routers talking amongst themselves
Creating the forwarding tables
Forwarding: data plane Directing a data packet to an outgoing link
Using the forwarding tables
1 2 3
0111
value in arriving
packet’s header
routing algorithm
local forwarding table
header value output link
0100
0101
0111
1001
3
2
2
1
Interplay between routing and forwarding
Transmitted
(forwarded)
packet
buffers: arriving packets
dropped if no free space
queued packets
routing algorithm determines end-end-path through network
forwarding table determines local forwarding at this router
Router architecture overview two key router functions: run routing algorithms/protocol
forwarding datagrams from incoming to outgoing link
high-seed switching
fabric
routing processor
router input ports router output ports
forwarding data
plane (hardware)
routing, management
control plane (software)
forwarding tables computed,
pushed to input ports
Goal of routing
You may be able to send datagrams directly to the destination
If not, the router attempts to send datagrams to a router that is nearer the destination.
The goal of a routing protocol is very simple: Find a “good” path from source to destination.
Hosts often have default routers/gateways We can focus on the routing from the source router
to the destination router
Routing algorithm classification
global: all routers have complete topology, link cost infomation
“link state” algorithms/protocols
Dijkstra’s algorithm and OSPF protocol (RFC 2328)
decentralized: router knows costs to physically-connected neighbors
iterative process of computation, exchange of info with neighbors
“distance vector” algorithms/protocols
Bellman-Ford algorithm and RIP protocol (RFC 2453)
u
y x
w v
z 2
2 1
3
1
1
2
5 3
5
Graph: G = (N,E)
N = set of routers = { u, v, w, x, y, z }
E = set of links ={ (u,v), (u,x), (v,x), (v,w), (x,w), (x,y), (w,y), (w,z), (y,z) }
Graph abstraction
Remark: Graph abstraction is useful in other network contexts
Example: P2P, where N is set of peers and E is set of TCP connections
Graph abstraction: costs
u
y x
w v
z 2
2 1
3
1
1
2
5 3
5 • c(x,x’) = cost of link (x,x’)
- e.g., c(w,z) = 5
• cost could always be 1, or inversely related to bandwidth, or inversely related to congestion
Cost of path (x1, x2, x3,…, xp) = c(x1,x2) + c(x2,x3) + … + c(xp-1,xp)
Question: What’s the least-cost path between u and z ?
Routing algorithm: algorithm that finds least-cost path
A Link-State Routing Algorithm
Dijkstra’s algorithm net topology, link costs
known to all nodes accomplished via “link
state broadcast”
all nodes have same info
computes least cost paths from one node (source) to all others gives forwarding table
for that node
notation: c(x,y): link cost from
node x to y; = ∞ if not direct neighbors
D(v): current value of cost of path from source to destination v
N': set of nodes whose least cost path definitively known
w 3
4
v
x
u
5
3 7 4
y
8
z 2
7
9
Dijsktra’s algorithm
Step
N' D(v)
0
1
2
3
4
5
D(w) D(x) D(y) D(z)
u ∞ ∞ 7 3 5
uw ∞ 11 6 5
14 11 6 uwx
uwxv 14 10
uwxvy 12
notes: construct shortest
path tree by tracing predecessor nodes
ties can exist (can be broken arbitrarily)
uwxvyz
Distance Vector (DV) algorithm
distributed, iterative, and asynchronous receives info from directly attached neighbors
continues on until no more info is exchanged
nodes can update and send info asynchronously
Dx(y) = estimate of least cost from x to y x maintains distance vector Dx = [Dx(y): y є N ]
For node x: knows cost to each neighbor v: c(x,v)
also maintains its neighbors’DVs. For each neighbor v, x maintains Dv = [Dv(y): y є N ]
Distance Vector Algorithm
Bellman-Ford Equation (dynamic programming)
Define
dx(y) := cost of least-cost path from x to y
Then
dx(y) = min { c(x,v) + dv(y) }
v
cost to neighbor v
cost from neighbor
v to destination y min taken over all
neighbors v of x
The fact behind Bellman-Ford
dx(y) = min {c(x,v) + dv(y) }
dv(y) – shortest distance between v and y c(x,v) – cost between x and v N - number of neighbor of x
v
Bellman-Ford example
u
y x
w v
z
2
2
1 3
1
1
2
5 3
5
clearly, dv(z) = 5, dx(z) = 3, dw(z) = 3
du(z) = min { c(u,v) + dv(z), c(u,x) + dx(z), c(u,w) + dw(z) }
= min { 2 + 5, 1 + 3, 5 + 3 } = 4
B-F equation says:
Practical contributions of B-F equation: node achieving minimum is next hop in the
shortest path, used in forwarding table
it suggests the form of the neighbor-to-neighbor communication used in the DV algorithm
key idea: from time-to-time, each node sends its own
distance vector estimate to neighbors
when x receives new DV estimate from neighbor, it updates its own DV using B-F equation:
Dx(y) ← minv { c(x,v) + Dv(y) } for each y ∊ N
under minor, natural conditions, the estimate Dx(y) converge to the actual least cost dx(y)
Distance vector algorithm
x y z
x
y
z
0 2 3
fro
m
cost to
x y z
x
y
z
0 2 7
fro
m
cost to
x y z
x
y
z
0 2 3
fro
m
cost to
x y z
x
y
z
0 2 3
from
cost to
x y z
x
y
z
0 2 7
fro
m
cost to
2 0 1
7 1 0
2 0 1
3 1 0
2 0 1
3 1 0
2 0 1
3 1 0
2 0 1
3 1 0
time
x y z
x
y
z
0 2 7
∞ ∞ ∞
∞ ∞ ∞
fro
m
cost to
fro
m
fro
m
x y z
x
y
z
0
x y z
x
y
z
∞ ∞
∞ ∞ ∞
cost to
x y z
x
y
z ∞ ∞ ∞
7 1 0
cost to
∞
2 0 1
∞ ∞ ∞
2 0 1
7 1 0
time
x z
1 2
7
y
node x table
Dx(y) = min{c(x,y) + Dy(y), c(x,z) + Dz(y)}
= min{2+0 , 7+1} = 2
Dx(z) = min{c(x,y) +
Dy(z), c(x,z) + Dz(z)}
= min{2+1 , 7+0} = 3
3 2
node y table
node z table
cost to
from
Real routing problems …
count-to-infinity problem (of distance vector algo): Dy(x) = 4, Dy(z) = 1, Dz(y) = 1, and Dz(x) = 5 link cost c(x,y) increases to 60 Dy(x) = min{c(y,x) + Dx(x), c(y,z) + Dz(x)} = min{60 + 0, 1 + 5} = 6 Dz(x) = min{50 + 0,1 + 6} = 7
oscillations problem (of link state algorithm): e.g., suppose link cost = amount of carried traffic:
x z
1 4
50
y 60
A
D
C
B
1 1+e
e 0
e
1 1
0 0
initially
A
D
C
B
given these costs, find new routing…. new costs
2+e 0
0 0
1+e 1
A
D
C
B
given these costs, find new routing…. new costs
0 2+e
1+e 1
0 0
A
D
C
B
given these costs, find new routing…. new costs
2+e 0
0 0
1+e 1
Routing is difficult in practice! Theory is simple, implementation is difficult:
Bellman-Ford algorithm 1-2 pages
RIP protocol, RFC 2453, 39 pages
Dijkstra’s algorithm 1-2 pages
OSPF protocol, RFC 2328, 245 pages
Protocols need to handle issues related to distributed systems in nature: Link failure
Inconsistent views
Routing loops