Bridges Routers
-
Upload
anil-kumar-dash -
Category
Documents
-
view
229 -
download
0
Transcript of Bridges Routers
-
8/4/2019 Bridges Routers
1/82
Chapter 1
Bridges, Switches, Routers
1.1 Introduction
Packet vs circuit (and virtual circuit) circuit switching
Networkmesh interconnection of links and switches
LANs (multiaccess, broadcast or shared medium Ethernet: 10BT1000BT, Cat 3 UTP)
WANs switches connected by point to point links
Packet processorsBridges, Routers, ATM switches
1
-
8/4/2019 Bridges Routers
2/82
2 CHAPTER 1. BRIDGES, SWITCHES, ROUTERS
Routing
Congestion control Reservation
SwitchingPolicing Scheduling
Controlpath
Data pathper packet
processing
Figure 1.1: Packet processor functions may involve the data path or control path.
1.2 Packet processor functions
Routing creating and distributing information that defines path between source and destination
and determining the best path
Switching per-packet forwarding decisions, and sending packet towards destination
Other functions congestion control, reservations, policing, scheduling
Control functions performed infrequently; datapath functions are performed per packet.
-
8/4/2019 Bridges Routers
3/82
1.3. TRANSPARENT BRIDGING IEEE 802.1D 3
L1
L2L3
L4 L5
B1 B2B3
B4 B4
L1
L2
L3
L4 L5
B1 B2B3
B4
10
10
10
10
20
2020
20
10
R
R RD
D
D D
D = dedicated port for LANR = root port for bridge
D
1. Determine root bridge, and set its ports
in forwarding mode.
2. Each bridge deterimines root port, and
sets it in forwarding mode.
3. Bridges determine designated port for
each LAN segment.
4. All other ports are in blocked state.
STP
Figure 1.2: Bridged extended LAN and corresponding graph. Bridge forwards frames along span-
ning tree, according to FDB.
1.3 Transparent bridging IEEE 802.1D
Ethernet LANs broadcast each packet to every device on the LAN. The throughput per host de-
creases with number of hosts connected to the LAN. See Problem 1.
Transparent bridging prevents this by interconnecting LAN segments (collision domains) and for-
wards unicast packets according to filtering database (FDB). Broadcast, multicast, and unknown
unicast are flooded to all LANs. So all segments form a single broadcast domain.
A bridge has two or more ports. Packets from incoming ports are forwarded to outgoing ports along
a spanning tree to prevent loops, according to FDB. See Figure 1.2.
spanning tree algorithm: one root, then shortest path to root;
learning process: produces FDB by relating MAC source address to incoming port and re-
moving unrefreshed entries.
Bridges exchange configuration messages to establish topology and topology-change messages to
indicate that STA should be rerun.
With a fixed number of bridge ports, througput per LAN segment decreases with the number of
segments in an extended LAN. See Problem 2.
-
8/4/2019 Bridges Routers
4/82
4 CHAPTER 1. BRIDGES, SWITCHES, ROUTERS
Figure 1.3: LAN vs VLAN topology
Figure 1.4: VLAN tags
1.4 LAN switches IEEE 802.1Q
A LAN switch is a bridge with as many ports as number of LAN segments, and with enough capacity
to handle traffic on all segments. Problem 2 is solved through VLAN.
Virtual LANs or VLANs is a collection of LAN segments and attached devices with the properties
of an independent LAN. Each VLAN is a separate broadcast domain: traffic on one VLAN is
restricted from going to another VLAN. Traffic between VLANs goes through a router.
VLAN tags or VID (4-byte) are added to MAC frames so switches can forward packets to ports with
same VID. FDB is augmented to include for each VID the ports (the member set) through which
members of that VLAN can be reached.
-
8/4/2019 Bridges Routers
5/82
1.4. LAN SWITCHES IEEE 802.1Q 5
The member set is derived from VLAN registration information: (i) explicitly by management
action or by (ii) GARP VLAN registration protocol (GVRP). GARP is generic attribute registrationprotocol.
Multicast filtering A VLAN is a single broadcast domain. If multicast messages are broadcast, the
througput is limited by the slowest link: A switch with 124 10-Mbps ports has a capacity of 1.24
Gbps but can transmit at most 6 1.5Mbps multicast video channels. GARP Multicast Registration
Protocol (GMRP) (IEEE 802.1P) allows switches to limit multicast traffic along the ST. (See IGMP.)
JOIN host sends this message to express interest in joining a multicast group. Switch adds
port to multicast group and forwards multicast source to these ports. JOIN messages are sent
once every JOINTIME timeout.
LEAVE message sent by host. Switch removes this port from multicast group unless anotherhost on that port sends JOIN message before LEAVETIME timeout.
LEAVEALL message peroidically sent by switch.
When a host sends IP data to a multicast (Class D IP) address, the host inserts the low order 23 bits
in the low order 23 bits of the MAC address. So a NIC that is not part of the group ignores these
data.
-
8/4/2019 Bridges Routers
6/82
6 CHAPTER 1. BRIDGES, SWITCHES, ROUTERS
Figure 1.5: The IP header provides precedence and type of service fields
Quality of service The 3-bit precendence allows 8 priority levels. The ToS bits are D-min delay,
T-max throughput, R-max reliability, C-min cost.
802.1 provides no support for priority. 802.1P provides in-band QoS signalling with 8 COS levels. A
conforming bridge or switch maintains 8 queues. (VLAN tags may also carry priority information.)
-
8/4/2019 Bridges Routers
7/82
1.5. PROBLEMS 7
1.5 Problems
1. The throughput per host decreases with number of hosts connected to the LAN. Formulate
two mathematical models, one deterministic and one stochastic, in which this quote is an
assertion. Then prove or disprove the assertion. You will have to model LAN speed, host
load and throughput.
Hint: Use M/M/1 model of section 3.3.
2. Follow Figure 1.2 and propose a graph model for an extended bridged LAN in which bridges
may have multiple ports.
(a) Use the graph to formulate two mathematical models, one deterministic and one stochas-
tic, within which one can determine the throughput per LAN segment.
(b) How would you formulate as a mathematical assertion the statement the througput per
LAN segment increases with the number of ports in the bridge?
Hint: try the Jackson network model of section 3.3.
3. Discuss the differences between STP and OSPF in terms of throughput or efficiency in link
utilization.
-
8/4/2019 Bridges Routers
8/82
8 CHAPTER 1. BRIDGES, SWITCHES, ROUTERS
-
8/4/2019 Bridges Routers
9/82
Chapter 2
Processor architecture
2.1 Datapaths
When packet arrives at bridge,
DA is searched in forwarding table (DA
output ports). If not found, packet is broadcast to
all output ports;
If found, it is forwarded across switching fabric to appropriate output port (or ports for mul-
ticast);
SA is learned and added to forwarding table;
During transfer to fabric packet may be stored or dropped if storage is full;
Packet is stored in output port queue (usually FIFO) and eventually transmitted.
When packet arrives at router,
DA is searched in forwarding table. If not found, packet is dropped;
If found, next-hop MAC address is appended, TTL is decremented, new Header Checksum is
calculated, and packet is forwarded across switching fabric to output port or ports;
During transfer to fabric packet may be stored: if storage is full, this (or another) packet may
be dropped;
Packet is stored in output queue (FIFO or more complex) and eventually transmitted.
When cell arrives at ATM switch,
Its VCI is searched in forwarding table (VC translation table: (VCI in, Port in)
(VCI out,
Port out)). If not found, cell is dropped;
9
-
8/4/2019 Bridges Routers
10/82
10 CHAPTER 2. PROCESSOR ARCHITECTURE
If VCI is policed, policing function determines if cell is conformant. If not, it may be dropped.
If yes, cell is forwarded across switching fabric to output port;
During transfer, cell may be stored: if storage is full, this or another cell may be dropped;
Cell is stored in output queue and eventually transmitted. Service discipline may be FIFO or
very elaborate.
-
8/4/2019 Bridges Routers
11/82
2.1. DATAPATHS 11
CPU
Memory
Line card #1
Line card #2
Line card #3
Line card #4
Line card #5
Line card #6
packet
CPU
Memory
Line card #1
Line card #2
Line card #3
CPU memory
packet
CPU memory
CPU memory
CPU
Memory
Line card #1
Line card #1
Line card #1
packet
CPU memory
CPU memory
CPU memory
CPU
Memory
Line card #1
Line card #2
Line card #3
packet
CPU memory
CPU memory
CPU memory
CPU
Memory
Line card #1
Line card #2
Line card #3
packet
CPU memory
CPU memory
CPU memory
A B
C D
Figure 2.1: Basic packet processor architecture
Throughput in A is limited by CPU speed;
In B, there is a choice about which CPU to forward packet;
In C, packet travels bus only once, so throughput limited by bus speed;
In D, several packets can be forwarded through crossbar.
General purpose CPUs are not well-suited for applications in which packets flow through. CPUs
are better when same data are examined several times, making use of cache.
-
8/4/2019 Bridges Routers
12/82
12 CHAPTER 2. PROCESSOR ARCHITECTURE
Routing
Congestion control Reservation
SwitchingPolicing Scheduling
Control
path
Data pathper packet
processing
Forwarding
decision
Switching
fabricPolicing Scheduling
Figure 2.2: Elaboration of datapath functions
2.2 Performance
The packet delay through switch fabric consists of time (1) for forwarding decision, and (2) to
transfer packet across switch.
Packet delay through processor consists of time (1) for policing decision, (2) forwarding decision,
(3) to transfer across switch, and (4) for output scheduling decision.
-
8/4/2019 Bridges Routers
13/82
2.3. FORWARDING DECISION 13
Header
arrival
time
Forwarding
decisiontime
Switch
transfer
time
packet size
min back-to-back
packet size
packet
arrival rate
Output
scheduling
decision time
Time
Figure 2.3: Delay of switch and packet processor
2.3 Forwarding decision
Criteria: (1) speed of address lookups depends on number of memory references; (2) size of memory
ATM switches perform direct lookup, figure 2.4
VCI address space is
= 16 M. Most switches contain
or fewer entries, since it is downstream
switch that chooses VCI that fits in supported address space (PNNI).
For multicast, lookup returns list of output ports, each with different VCI.
Address D
ataDRAM
VCI (port, new VCI)
Figure 2.4: ATM switches perform direct lookup
-
8/4/2019 Bridges Routers
14/82
14 CHAPTER 2. PROCESSOR ARCHITECTURE
Network
address
Associated
data
net address
48 bits
associated
data
hit
location of
entry
log2N bits
(size N memory)
Figure 2.5: CAM or Content addressable memory. The 48-bit MAC address is presented. A suc-
cessful parallel search asserts hit signal and returns pointer to entry where forwarding information
for the MAC address is stored.
Bridge Address space is
so direct lookup is not possible. Three indirect lookup techniques:
Associative memory. Figure 2.5. Typical CAM size is
entries. Not suitable for large
LANs which support
entries.
-
8/4/2019 Bridges Routers
15/82
2.3. FORWARDING DECISION 15
Hashing
function
DRAMad
dress d
a
ta
48 bits 16 bits log2N
address ofN linked lists
M addresses
Figure 2.6: A 48-bit address is presented and the hashing function returns a pointer to one of
linked lists. The search through a linked list takes a random time proportional to length of list.
Hashing. For large LANs hashing is an option. Suppose the LAN has
hosts. A hashing function,
, maps a hosts 48-bit address to a forwarding table with, say,
entries as in Figure 2.6.
Two addresses
may collide:
. The entry points to a linked list of (MAC address,
forwarding data) of MAC addresses that map into the same entry. The list must be searched sequen-
tially to locate the MAC address. The duration of search is proportional to the length of the list.Suppose
maps the
MAC addresses
into the
linked lists
. Assume
that
are independent uniformly distributed over
.
The length of the
th list is the ramdom number
(2.1)
Let
. If
is small (number of lists larger than number of possible addresses), the lists
will usually have 0 or 1 element. Problem 3 asks to find the distribution of
. For
(
),
the mean length of the list is about
. However,
being random, there is a chance that
some lists (and corresponding search time) may be very large. For real-time applications, you maystore forwarding tables in such a way (e.g. as trees) that retrieval has a deterministic bound,
-
8/4/2019 Bridges Routers
16/82
16 CHAPTER 2. PROCESSOR ARCHITECTURE
Prefix Outgoing port128.32.0.0 /16 1
128.32.239.0/24 7
128.32.239.3/32 3
Figure 2.7: Forwarding table with CIDR
IP routers. With CIDR, router forwarding table entries are identified by a pair, (route prefix/prefix
length), with prefix length between 0 and 32 bits. See Figure 2.7. The entry 128.32.0.0/16 is a
16-bit long entry.
The forwarding decision must find the longest prefix match between the packets destination IP
address and the prefixes in the forwarding table.
CIDR reduces table, but the forwarding decision is more complex. See [9].
With declining memory cost, it may be more economical to expand the prefixes and use simpler,
exact matching algorithms.
-
8/4/2019 Bridges Routers
17/82
2.3. FORWARDING DECISION 17
Caching. The forwarding decision delay can be reduced by caching. Idea is that the IP destination
addresses of successive packets are correlated.
The cache stores the full source and destination IP address and the corresponding forwarding deci-
sion (including perhaps the entire replacement IP header).
When packet arrives SA and DA are used to do a full match in the local cache. If the addresses are
not there, the packet is forwarded to a central routing processor. A cache replacement rule is needed
if there is a cache miss.
The improvement in delay depends on (1) the ratio of cache size to the size of the forwarding table,
and (2) the temporal locality. The latter is likely to be higher in a campus router than an edge router
and larger there than in a core router. See Problem 4.
Multicast. Some routers support multicast. The simplest rule is RPF (reverse-path forwarding): If a
multicast packet arrives on port
from source
, look up
in the forwarding table. If
is the best
port to reach
, forward the packet on all ports except
.
Switching fabrics Need some queuing models.
-
8/4/2019 Bridges Routers
18/82
18 CHAPTER 2. PROCESSOR ARCHITECTURE
2.4 Problems
1. For a commercial LAN switch, find the various times in Figure 2.3. Also give the throughput.
See, for example, www.bcr.com/bcrmag/08/98p25.htm
2. If forwarding decision, switch transfer, and output scheduling can be pipelined, what is the
throughput of the processor?
3. Find the (marginal) distribution of the
given in (2.1), and calculate the mean length
of a list. Show that for
small, the mean is approximately
.
Find the joint distribution
. Verify that it has the product form:
Here
, so the denominator is the normalizing constant.
Take
. Find the probabililty that
.
Suppose a memory access takes 100 ns,
. Consider back-to-back Ethernet packets.
What is the average throughput of this switch using the model of Figure2.3 and ignoring the
output scheduling decision delay.
4. The packets arriving at a line card belong to several multiplexed TCP connections.
(a) Formulate a model of packet arrivals with say
simultaneous connections and in which
connections last a random amount of time with a geometric distribution and mean
.(b) Suppose the size of the cache is
. If there is a cache miss, an existing entry is replaced
by the missing entry. How would you calculate the hit ratio as a function of
?
(c) Suppose you are given a typical trace of the addresses of packet arrivals, but no model
of the arrival process. You want to know how big a cache you would need so that the hit
ratio is a certain value, say
. What would you do?
(d) The time to search a cache is
, the time to search the central forwarding table is
,
the hit ratio is
. How would you decide if its worth having a cache?
-
8/4/2019 Bridges Routers
19/82
Chapter 3
Queuing
3.1 Discrete time Markov chains
is a Markov chain with
finite or countable, stationary probability matrix
, initial distribution
.
So
(3.1)
for all
.
is the marginal distribution of written as a row vector. From (3.1)
(3.2)
is invariant if it satisfies the balance equations
(3.3)
is irreducible if it goes from any state
to any other state
(with positive prob). Irreducible
chains have at most one invariant distribution. The chain is positive recurrent if it has one invariant
distribution.
If
is irreducible,
(3.4)
i.e.
is the fraction of time
spends in state
.
is aperiodic if
, where
gcd
If
,
is periodic with period
.
19
-
8/4/2019 Bridges Routers
20/82
-
8/4/2019 Bridges Routers
21/82
-
8/4/2019 Bridges Routers
22/82
-
8/4/2019 Bridges Routers
23/82
3.2. CONTINUOUS-TIME MARKOV CHAINS 23
xt
t
t1 t2 t3
S1S2
S3
Figure 3.2: A trajectory in of Theorem
Theorem (Markov property) For any set
of trajectories
Such
is of the form
,
,
. See Figure 3.2.
-
8/4/2019 Bridges Routers
24/82
24 CHAPTER 3. QUEUING
is irreducible if
for all
if
is irreducible, where
Theorem Suppose
is c-t Markov chain with rate
and initial distribution
. Then
1.
is invariant,
iff balance equation
(3.6)
2.
has at most one invariant distribution
and then
3. If
has no invariant distribution,
-
8/4/2019 Bridges Routers
25/82
3.2. CONTINUOUS-TIME MARKOV CHAINS 25
Theorem (Time reversal) Suppose
is stationary, c-t, Markov with rate
, distribution
. The
time-reversed process
is stationary, Markov, with distribution
and rate
where
Why?
and
-
8/4/2019 Bridges Routers
26/82
26 CHAPTER 3. QUEUING
0 1 2 3
0
1
2
3
xt
t
arrivals
departures
Figure 3.3: Diagrams for M/M/1 system. Arrivals (blue) and departures(red) form Poisson pro-
cesses.
3.3 M/M/1 model
See Figure 3.3. The balance equation (3.6) is
which has a (unique) solution iff
:
with
(3.7)
-
8/4/2019 Bridges Routers
27/82
3.3. M/M/1 MODEL 27
The queue
is time-reversible, because
so the rate matrix of the time-reversed process,
, is the same as that of
.
So the departures before time
form a Poisson process with rate
, independent of
. Surprise!
The mean queue length is
For
, the mean is 10 packets.
Above,
av. number of exponential packet arrivals per sec
av. number of packets that can be transmitted per sec
av. utilization
-
8/4/2019 Bridges Routers
28/82
28 CHAPTER 3. QUEUING
A packet arriving at time
sees
packets in queue with
packet arrives in
packet arrives in
packet arrives in
so the average time between departure and arrival (including packet service or transmission time) is
Alternatively,
.
Example Consider a 10 Gbps link. Packet lengths are exponentially distributed with mean length
10,000 bits.1 So
packets/s and
s per packet.
Link utilization is 90 percent, i.e.
. Then the average number of packets in buffer is
. The average delay faced by a packet including its own service (transmission) time
is 10
s.
If the packet goes through 10 nodes the average delay is 100
s (assuming independence of nodes).
For a 100 Mbps link, with same packet length distribution,
,
s/packet, and the average delay is 1000
s per link.
The probability of 100 or more packets in buffer is
Compare queuing delay with propagation delay of
s/km = 15 ms for 3,000 km link.
Possible number of bits in the 3,000 km, 10 Gbps link is
.
1What is a more realistic distribution?
-
8/4/2019 Bridges Routers
29/82
3.3. M/M/1 MODEL 29
Alternative formulation
is a Poisson counting process with rate
the arrival
process.
be a Poisson counting process with rate
the virtual service process.
are independent.
The queue at
is given by
The departure counting process is
,
is also Poisson. Moreover,
Future arrivals,
, and current state,
, are independent;
Past departures,
, and current state,
, are independent.
-
8/4/2019 Bridges Routers
30/82
30 CHAPTER 3. QUEUING
external traffic
rate is i pkt/sec
i
line rate is i pkt/sec
r(j,i)j
Switch
line i
external traffic
traffic
from
network
Figure 3.4: Parameters of Jackson network
Jackson network See Figure 3.4. Assumptions:
Independent, exponential service times with rate
;
Markovian routing
;
Poisson external arrivals at rate
packets/sec;
Aggregate arrivals into node
is
where
all
(3.8)
Let
be queue-length process. This is Markovian. Problem 5 asks to find its rate
matrix.
-
8/4/2019 Bridges Routers
31/82
3.3. M/M/1 MODEL 31
Theorem Assume
, all
. Then
has an invariant distribution of the product form:
where
with
This is a surprising result. The departure from any node in the Jackson network need not be Poisson,
unlike the case of a single M/M/1 system.
-
8/4/2019 Bridges Routers
32/82
32 CHAPTER 3. QUEUING
0 1 2 3
2 3
m-1 m m+1
(m1) m m
route to first freeserver
Figure 3.5: The M/M/m/
system
3.4 Other M/M/m/n models
M/M/m, the m server case The received request is routed to the first of
available servers, Figure
3.5. The buffer is infinite. The balance equations are
This gives
(3.9)
It is assumed that
.
is obtained using
,
A packet arriving at time
sees all servers busy (
) with probability
packet arrives in
packet arrives in
packet arrives in
packet arrives in
from (3.9)
The expected number of packets waiting in queue (not in service) is
-
8/4/2019 Bridges Routers
33/82
3.4. OTHER M/M/M/N MODELS 33
By Littles law (see below), the average waiting time in queue (not in service) is
and the total latency (waiting time) is
-
8/4/2019 Bridges Routers
34/82
34 CHAPTER 3. QUEUING
3.5 Littles law
Suppose
is the cumulative arrivals in
into a stable queueing system,
is number of
packets in system (including those in service). Let
be latency of packet
. Let
be arrival rate.
Suppose queue is empty at
and
. From figure 3.6, the time average of queue size is
Taking limits as
, and if time averages equal ensemble averages, we get
S1 S2
W2 W4S3 S5S4
W5W5
x(t)
t
A(t)
Figure 3.6: Calculations for Littles law
-
8/4/2019 Bridges Routers
35/82
3.6. PASTA 35
3.6 PASTA
We have used the PASTA property (Poisson arrivals see time averages) several times.
Consider stationary queuing system with deterministic service time of 3 and periodic arrivals (period
10). A sample path with arrivals at 1,2,3,11,12,13,21,22,23,
and queue process
is shown in
figure 3.7.
Let
be the probability that
at any time
, and let
be the probabililty that an
arriving packet sees
packets in queue. For this system,
so the two probabilities are not the same.
1 2 3 4 5 6 7 1110
x(t)
Figure 3.7: PASTA property does not hold in this deterministic queuing system
Consider a M/G/1 system, with stationary probabilities
. Let
be the probability that an
arrival sees
packets in queue. Then,
packet arrives in
packet arrives in
packet arrives in
using Bayes rule, independence of arrivals after
from
, and independence of service
times.
-
8/4/2019 Bridges Routers
36/82
36 CHAPTER 3. QUEUING
S1 S3S4
S5W
Wt
W(t)
W3
area (2)
Figure 3.8: Deriving Pollaczek-Khinchin formula
3.7 Pollaczek-Khinchin formula
Consider M/G/1 system with independent service times
,
,
, Poisson arrivals
with rate
. Let
be the remaining waiting time, i.e. the amount of time needed to serve packets
in the system at
. Let
and
be the service time and waiting times of packet
, see figure 3.8.
The time average of waiting time
is the parallelogram area for packet
, so
. Substituting and taking
limits as
,
waiting time faced by arriving packet
By PASTA,
waiting time faced by arrival
. So,
where
is the utilization.
Note: The formula
involving a random sum of
terms is sometimes called Walds formula. A general version of
Walds formula is a consequence of the fact that
is a martingale. See Problem8.
Determinism minimizes waiting
In general,
, so
where the last expression is the waiting time for a deterministic service time (eg. ATM cells).
-
8/4/2019 Bridges Routers
37/82
-
8/4/2019 Bridges Routers
38/82
-
8/4/2019 Bridges Routers
39/82
Chapter 4
Switching
4.1 Packet switching
Architectures
IQ/HOL
VoQ
SQ
39
-
8/4/2019 Bridges Routers
40/82
40 CHAPTER 4. SWITCHING
is blocked
IQ: hol blocking OQ: faster switch
VoQ: matching SQ: reduces buffer size
Figure 4.1: Packet switch architectures
4.1.1 Architectures
Second generation PRIZMA architecture is , with 2 Gbps ports ... all on one chip [15]
-
8/4/2019 Bridges Routers
41/82
4.1. PACKET SWITCHING 41
/N
1
3
1
1
2
2
1
2
3
1
11
1HOL queue
AtXt
Input from nonblocked queues
10
8
6
4
2
0
-
------------------
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1
Average delay in cell times
read HOL arrivals
xt
At xt+1
min(1, xt)
Xt
Figure 4.2: Virtual HOL queue
4.1.2 Input queues
Assume:
discrete time
, independent arrivals, uniform destination with prob
large so total number of port 1 arrivals is Poisson
port 1 arrivals
Virtual HOL queue of
port 1 packets at head of queue:
(4.1)
where,
number of new port 1 packets that come to head unblocked queues
Suppose equilibrium probability of unblocked queue is is
. Then
is Poisson with mean
. From (4.1)
-
8/4/2019 Bridges Routers
42/82
42 CHAPTER 4. SWITCHING
so
. Square (4.1), and take expectations,
(4.2)
Since
,
Also
# blocked queues
For
this gives
,
That is 42 % switch bandwidth is not utilized.
quick upper bound Same switch. But at end of each cycle IQs are flushed. With
, switch
throughput is
Per port throughput is
-
8/4/2019 Bridges Routers
43/82
4.1. PACKET SWITCHING 43
4.1.3 Virtual output queue
Each input port has
VoQs, one per output port.
If several input ports have packets for same destination, which one should be served.
Assume iid arrivals
with rates
such that
Service
such that
Note: If = above,
is a permutation over
.
Queue lengths
such that
Question: Given
, find
, based on past arrivals
and
, so that
is stable.
Conjecture: Always exists stabilizing matching
.
-
8/4/2019 Bridges Routers
44/82
-
8/4/2019 Bridges Routers
45/82
-
8/4/2019 Bridges Routers
46/82
46 CHAPTER 4. SWITCHING
Shared memory
queue 1
queue N
input 1
input N
output 1
output N
path by packet from
input 1 to output N
shared bus
Figure 4.4: Switch with time-division shared bus and centralized shared memory
4.2 Shared queue
This architecture is used in most low speed packet processors: a time-division bus with a centralized
memory shared by all input and output lines, figure 4.4. Up to
packets may arrive at one time
and up to
may be read at one time, so memory bandwidth must be
-times line rate. Assume
100 ns DRAM access time, 53B-wide bus, gives total bandwidth of
Mbps. For a
16-port ATM switch, this gives line rate of Mbps.
Let
size of 1-list at beginning of slot
# of 1-packets arriving in slot
(4.5)
Following same argument that led to (4.2) gives
where
and
. For the Poisson case,
, so
(4.6)
Shared vs separate queue
Suppose shared buffer is sized at
where
is the standard deviation of
. Then
separate buffer size
shared buffer size
-
8/4/2019 Bridges Routers
47/82
4.3. OUTPUT QUEUE 47
4.3 Output queue
In an output queued switch the switch fabric must run
-times, and the ouptut memory must run
-times line rate. The queue length in port 1 is given by (4.5).
-
8/4/2019 Bridges Routers
48/82
48 CHAPTER 4. SWITCHING
4.4 Problems
1. Assume that
is Poisson in (4.1) or (4.5). The mean queue size is given by (4.6). Is
(a)
Markov? Why?
(b) If
is stationary, how would you find
?
-
8/4/2019 Bridges Routers
49/82
Chapter 5
Matching
Crossbar switches need a controller to schedule a switch. The controller must find a good match,
eg. longest queue first, oldest cell first, etc.
It is too expensive to run a centralized matching algorithm with complexity
or
. (A
40-byte packet at a line speed of 1 Gbps amounts to 360 ns/packet.)
So one may have to be satisfied with maximal matching, using distributed algorithms. Note that for
a fully-connected bipartite graph, a maximal matching is also maximum.
In case of QoS, the matching must satisfy some preferences.
49
-
8/4/2019 Bridges Routers
50/82
-
8/4/2019 Bridges Routers
51/82
5.1. THE DATING GAME 51
The GSA algorithm. Say that a man or woman is
freeif she/he is not engaged or matched to any man/woman
engagedif she/he is temporarily matched to some man/woman
matchedif she/he is terminally matched
-
8/4/2019 Bridges Routers
52/82
52 CHAPTER 5. MATCHING
BEGIN
all are free
Is
some man
m free?
m proposes to w,
the first woman he has not yet
proposed to
is
w free?
w is currently
engaged to m'
does w
prefer m to
m'?
match w and m,
set m' free
ENDNo
Yes
w engaged to myes
m continues freeno
Figure 5.1: The GS algorithm
-
8/4/2019 Bridges Routers
53/82
-
8/4/2019 Bridges Routers
54/82
54 CHAPTER 5. MATCHING
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
a1
a3
g2
g4
1
2
3
4
1
2
3
4
Figure 5.2:
RRM showing
pointers with
.
5.2 Round-robin matching
Each input
maintains accept pointer
. Each output
maintains grant pointer
.
RRM cycle.
Step 1 Each
requests all
with
.
Step 2 Each
grants next requesting input
at or after current pointer value
, i.e.
then increments
.
Step 3 Each
accepts next granted output
at or after current pointer value
, i.e.
If grant
has been accepted, increments
. Figure 5.2 illustrates one RRM cycle. Initially,
all
, and all
. The input requests are
So we have the following steps:
At the end of this cycle, the match is
, and the pointer values are given above.
-
8/4/2019 Bridges Routers
55/82
5.2. ROUND-ROBIN MATCHING 55
5.2.1 Analysis of RRM
Under heavy load, the grant counters may get synchronized, reducing utitlization. Consider
all
. Then it is possible for
always as follows.
Match
Match
Match
Match
At the end of the fourth cycle the situation repeats. Throughput is 50 percent.
Of course the following TDM cycle is also possible, and has througput of 100 percent.
Match
Match
Under heavy load, if grant counters get syncronized at any time (i.e. have the same value), they ll
stay synchronized forever.
Under light load, the grant counters will be randomly distributed. The probability that some input
is not served is
-
8/4/2019 Bridges Routers
56/82
56 CHAPTER 5. MATCHING
1
2
1
2
(1,1) = 1
(1,2) = 1
(2,1) = 1
(1,1) =/
(1,2) =3/
(2,1) =3/
Figure 5.3: PIM can be unfair under heavy load
5.3 Partial iterative matching, PIM
Step 1 Each unmatched input
sends requests to every output
such that
.
Step 2 Each
randomly picks
from received requests.
Step 3 Each
randomly accepts one of received grants.
The I in PIM means that this cycle is repeated to improve match.
5.3.1 Analysis of PIM
It appears that with uniform iid traffic, PIM achieves maximal match in 3 iterations.
In heavy load, every input makes requests. Probability that
receives no grant in one round equals
PIM can be unfair. Figure 5.3 gives a
case where the request rates from input
to output
is
. So requests
are made in each slot. The grant rates from output
to
input
will therefore be
. So input will accept output 1 with
probability
, and output 2 with probability
; input 2 will accept output
1 with probability
.
Thus even though arrival rates for output port 1 are equal at inputs ports 1 and 2, the acceptance
rates are not the same.
-
8/4/2019 Bridges Routers
57/82
5.4. ISLIP MATCHING 57
5.4 iSLIP matching
The detailed reference is [4]. The RRM suffers from synchronization of the grant counters. The
iSLIP modifies RRM slightly so that the grant counters are incremented only if the grant is accepted.
So step 2 of RRM is modified.
Step 2 Each
grants next requesting input
at or after current pointer value
, i.e.
then increments
only if
accepts output
.
-
8/4/2019 Bridges Routers
58/82
-
8/4/2019 Bridges Routers
59/82
5.4. ISLIP MATCHING 59
5.4.2 Priority iSLIP
Suppose there are
priority levels. Then each input
maintains
VoQs, with
the
buffer occupancy of priority
and output
. Then
gives strict priority, i.e. serves
only
if
,
. Each input maintains counter
and each output maintains
for each
priority level.
Step 1 Each
selects highest priority level
with non-empty queue to output
.
Step 2 Output
determines highest priority level
. The output then chooses one
input among those inputs that have requested at level
. The output maintains separate pointer
, and chooses input
among requests at level
in the same round-robin scheme. The
output notifies each input whether or not its request is granted. The pointer
is
incremented only if granted input
accepts output
.
Step 3 If input
receives any grants, it determines the highest priority level grant, say
. The
input then chooses one grant among the requests granted at this level. This is done according to the
counter
, which is incremented
. The input then notifies each output whether or
not its grant was accepted.
-
8/4/2019 Bridges Routers
60/82
60 CHAPTER 5. MATCHING
5.4.3 Threshold iSLIP
It may be better to select a weighted maximal match with weights corresponding to queue length.
If queue lengths are quantized in threshold levels
, then priorities may be assigned
accordingly as
.
5.4.4 Weighted iSLIP
Suppose bandwidth from
to
is to be shared according to the ratio
subject
to
,
.
In iSLIP each counter is an ordered circular list
. Now expand the list at output
to
where
is the lcd of
and input
appears
times in the list.
-
8/4/2019 Bridges Routers
61/82
5.4. ISLIP MATCHING 61
stateofinputqueues(N2
bits)
Grant
arbiters
Accept
arbiters
Decision
register
1
2
N
1
2
N
Figure 5.4: Interconnection of the input and output arbiters to construct the iSLIP scheduler
5.4.5 Implementation
Figure 5.4 shows how the iSLIP scheduler for a
switch is constructed from the input and
output arbiters.
The state memory records whether an input queue is empty. From this memory, an
-
bit wide vector presents
bits to each of the
output grant arbiters representing Step 1
(request).
The grant arbiters select a single input among the contending requests to implement Step 2
(grant).
The grant decisions are presented to the
accept arbiters, each of which selects at most one
output on behalf of each input to implement Step 3 (accept).
The final decision is stored in the decision registers and the value of the
and
pointers
are updated. The decision register is used to notify each input which cell to transmit and to
configure the crossbar switch.
-
8/4/2019 Bridges Routers
62/82
62 CHAPTER 5. MATCHING
-
8/4/2019 Bridges Routers
63/82
Chapter 6
Network processors
Figure 6.1 is a logical diagram of how a network processor (NP) fits in a system design. The
NP is located between the physical layer (MAC or framer) and the switch fabric. In the figure
the Serializer/Deserializer (SERDES) is the interface between the NPU and switch fabric. The
framer or MAC presents a packet to the NPU which must examine it, parse it, do necessary edits
and database lookups to enforce various policies at layers 3-7 (forwarding, queuing, labels), and
exchange messages with switch controller. The NP is in the data path.
63
-
8/4/2019 Bridges Routers
64/82
64 CHAPTER 6. NETWORK PROCESSORS
Figure 6.1: Location of NP in a logical diagram. Source [17].
6.0.6 NP operation
Figure 6.2 shows a generic block diagram. Data of multiple physical interfaces or the switch fabric
are transferred to/from the NP. The bitstream processors receive the serial stream of packet data and
extract the information needed to process the data, such as MAC or IP source/destination address,TOS bits, TCP port numbers, MPLS or VLAN tags. The packet is then written into the packet
buffer memory. This information is fed to the processor complexthe programmable unit of the
NP. Under program control, the processor may extract additional information from the packet and
submits relevant information to the search engine which looks up the MAC or IP address, classifies
the packet, or does a VCI/VPI lookup using the routing/bridging tables. Upon packet transmission
through the bitstream processor, the necessary modifications to the packet header are performed.
-
8/4/2019 Bridges Routers
65/82
65
packet
buffer
memory
routing
and
bridging
tables
buffer
manager/scheduler
general
purposeCPU
processor
complex
search engine HW assistsbitsteream
processors
To/from PHY/
switch fabric
Figure 6.2: Generic NP architecture. Source [15].
Figure 6.3: Time to process 40B packets at different line rates. Source [17]
6.0.7 Speed of operations
Table 6.3 shows the time available to process back to back 40B packets at different line speeds. At
1 Gbps, the time to process one packet is 360 ns. Using 10-ns SRAM permits a maximum of 36
memory accesses. Thus faster line rates can be accommodated only by processing several packets
simultaneously in a pipelined or parallel fashion.
-
8/4/2019 Bridges Routers
66/82
66 CHAPTER 6. NETWORK PROCESSORS
6.0.8 Packet buffer memory
For the architecture offigure 6.2, each packet header byte may traverse the memory interface at
least four times:
write inbound packet
read header into processor complex
write back to memory
read for outbound transmission
So for 40 byte back-to-back packets the required memory interface capacity is 10-120 Gbps for linerates of 2.5-40 Gbps.
-
8/4/2019 Bridges Routers
67/82
-
8/4/2019 Bridges Routers
68/82
68 CHAPTER 7. DISTRIBUTED SWITCH
1
N
1
M
cap = 1
Figure 7.1: A distributed switch is a network of switches with certain number of input and output
ports,
input nodes and
output nodes
7.2 Clos network
This is a 3-stage network as illustrated in Figure 7.2. The Clos network is specified by 5 numbers
IN
OUT. There are
switches arranged in 3 stages. The number of
input-output ports and connectivity of the switches are as shown.
Theorem A Clos network with RNB switch modules is RNB iff
IN
OUT
A Clos network with SNB switch modules is SNB iff
IN
OUT
The total number of input lines is IN
. The total number of output lines is OUT
.
The Clos network in the figure is SNB. It has 9 input lines and 8 output lines.
-
8/4/2019 Bridges Routers
69/82
-
8/4/2019 Bridges Routers
70/82
-
8/4/2019 Bridges Routers
71/82
7.3. RECURSIVE CONSTRUCTION 71
N = p x q
q x q
N = p x q
q planesq planes
p x p p x p
1
q
p
1
p
1
q
p
Figure 7.4: Recursive construction of a RNB CLos network
Figure 7.4 is a
RNB switch if each module is RNB.
-
8/4/2019 Bridges Routers
72/82
72 CHAPTER 7. DISTRIBUTED SWITCH
N N
N/2 N/2
N/2 N/2
2 log2N 1 stages ofN /2 2 2 switches
2 2
Figure 7.5: The Benes switch
Figure 7.5 is a
RNB switch made up of
switch modules.
-
8/4/2019 Bridges Routers
73/82
-
8/4/2019 Bridges Routers
74/82
74 CHAPTER 7. DISTRIBUTED SWITCH
Figure 7.7 illustrates an algorithm to rearrange existing connections in order to accommodate a new
connection.
Question 1: Can you supply a proof?
Question 2: Is there an alogrithm to accommodate new connections in an arbitrary network of Figure
7.1?
-
8/4/2019 Bridges Routers
75/82
7.3. RECURSIVE CONSTRUCTION 75
12
3 4
5
Figure 7.7: Algorithm to add a new connection for a RNB switch
-
8/4/2019 Bridges Routers
76/82
76 CHAPTER 7. DISTRIBUTED SWITCH
In a Benes switch, feasible flows may require multiple paths. Figure7.8 and 7.9 show this. Note:
-
8/4/2019 Bridges Routers
77/82
7.3. RECURSIVE CONSTRUCTION 77
1234
1 2 3 4
e
e
1-2e
1-e
1-e
e e
1
e
1 1-e
1-e
e
1 1-ee
1
1-2e
2e
1-e
e
4
1
4
1
3
e
1-e
1 1-e
e
e
1
1-2e
2e
1-e
1-e
e
1-2e
e2
Figure 7.8: Split flow 1
-
8/4/2019 Bridges Routers
78/82
78 CHAPTER 7. DISTRIBUTED SWITCH
1
1
e
e
1-e1
1-2e
1-e
e
e
1-e
2e
1
1
e
e
1-e
1-2e
1-e
4
2
234
1 2 3 4
e
e
1-2e
1-e
1-e
e e
1
3
1
1
e
e
1-e e
1-2e1-e
1
1-2e
1-e
e
e
1-ee
1-e
2e
1
Figure 7.9: Split flow 2
-
8/4/2019 Bridges Routers
79/82
7.3. RECURSIVE CONSTRUCTION 79
1
1
1
1 1
2
23
3
Figure 7.10: Max flow for single commodity is 3 and flows are integers; in multi-commodity case,
max flows are 0.5 and non-integer
In a Clos switch, permutations can be achieved without splittling flows. In a general multi-commodity
case this is not so. Figure 7.10 shows that if this is a single commodity problem, the maximum flow
is 3 and all flows are 1 (integer).
However, if the flows are
, the max flows are 0.5 each, and not integer.
-
8/4/2019 Bridges Routers
80/82
80 CHAPTER 7. DISTRIBUTED SWITCH
1
1
1
1 1
2
23
3
1
1
1
1 1
2
23
3
0.5
0.5
Figure 7.11: Two copies offigure 7.10 are connected in parallel. Achieving flows of 1,1,1 requiressplittling
Figure 7.11 shows that a feasible permutation may require splittling flows. The green and cyan
flows must be connected in parallel similarly to the red flow.
-
8/4/2019 Bridges Routers
81/82
Bibliography
[1] J. Walrand and P. Varaiya. Chapter 12, Switching. High performance communication networks
2nd edition, 2000.
[2] M.J. Karol, M. Hluchyj and S. Morgan. Input vs output queueing on a space-division packet
swtich. IEEE Trans Comm, COM-35(12): 1347-56, Dec. 1987.
[3] T.E. Anderson, S. Owicki, J. Saxe and C.P. Thacker. High-speed scheduling for local area
networks. ACM Trans Computer Systems, 11(4):319-52, Nov. 1993.
[4] N. McKeown. iSLIP: a scheduling algorithm for input-queued switches. IEEE Trans Network-
ing, 7(2), April 1999.
[5] N. McKeown, V. Anatharam and J. Walrand. Achieving 100% througput in an input-queued
switch. Proc. Infocom 96, vol 1: 296-302.
[6] B. Prabhakar and N. McKeown. On the speedup required for combined input and output
queued switching. Automatica, 35(12), Dec. 1999
[7] J.F. Hayes, R. Breault and M.K. Mehmet-Ali. Performance analysis of a multicast switch.
IEEE Trans Comm, COM-39(4): 581-87, April. 1991.
[8] B. Prabhakar, N. McKeown and R. Ahuja. Multicast scheduling for input-queued switches. J.
Selected Areas in Comm 15(5):855-66, June 1997.
[9] M. Waldvogel, G. Varghese, J. Turner and B. Plattner. Scalable high speed IP routing lookups.
ACM Sigcomm 97September 1997.
[10] A. Demers, S. Keshav and S. Shenker. Analysis and simulation of a fair queueing algorithm.
ACM Sigcomm 89 Computer Communication Review, 19(4): 1-12, 1989.
[11] A. Parekh and R. Gallager. A generalized processor sharing approach to flow control of inte-
grated services networks: the single node case. IEEE Trans Networking, 1(3): 344-57, June
1993.
[12] A. Parekh and R. Gallager. A generalized processor sharing approach to flow control of inte-
grated services networks: the multiple node case. IEEE Trans Networking, 2(2): 137-50, April
1994.
81
-
8/4/2019 Bridges Routers
82/82
82 BIBLIOGRAPHY
[13] S. Floyd and V. Jacobsen. Random early detection. IEEE Trans Networking, 1(4): 397-413,
August 1993.
[14] I. Stoica, S. Shenker and H. Zhang. Core-stateless fair queuing: achieving approximately fair
bandwidth allocations in high speed networks. ACM Sigcomm 98, 1998.
[15] W. Bux, W.E. Denzel, T. Engbersen, et al. Technologies and building blocks for fast packet
forwarding. IEEE Communications Magazine, 39(1): 70-77, January 2001.
[16] P.R. Kumar and S. Meyn. Stability of queuing networks and scheduling policies. IEEE Trans.
Automatic Control, 40(2), February 1995.
[17] A. Deb. Building a network-processor based system. Integrated Communications Design,
December 2000. Available at www.icdmag.com.