Bridges Routers

8/4/2019 Bridges Routers

1/82

Chapter 1

Bridges, Switches, Routers

1.1 Introduction

Packet vs circuit (and virtual circuit) circuit switching

Networkmesh interconnection of links and switches

LANs (multiaccess, broadcast or shared medium Ethernet: 10BT1000BT, Cat 3 UTP)

WANs switches connected by point to point links

Packet processorsBridges, Routers, ATM switches

1


2/82

2 CHAPTER 1. BRIDGES, SWITCHES, ROUTERS

Routing

Congestion control Reservation

SwitchingPolicing Scheduling

Controlpath

Data pathper packet

processing

Figure 1.1: Packet processor functions may involve the data path or control path.

1.2 Packet processor functions

Routing creating and distributing information that defines path between source and destination

and determining the best path

Switching per-packet forwarding decisions, and sending packet towards destination

Other functions congestion control, reservations, policing, scheduling

Control functions performed infrequently; datapath functions are performed per packet.


3/82

1.3. TRANSPARENT BRIDGING IEEE 802.1D 3

L1

L2L3

L4 L5

B1 B2B3

B4 B4

L1

L2

L3

L4 L5

B1 B2B3

B4

10

10

10

10

20

2020

20

10

R

R RD

D

D D

D = dedicated port for LANR = root port for bridge

D

1. Determine root bridge, and set its ports

in forwarding mode.

2. Each bridge deterimines root port, and

sets it in forwarding mode.

3. Bridges determine designated port for

each LAN segment.

4. All other ports are in blocked state.

STP

Figure 1.2: Bridged extended LAN and corresponding graph. Bridge forwards frames along span-

ning tree, according to FDB.

1.3 Transparent bridging IEEE 802.1D

Ethernet LANs broadcast each packet to every device on the LAN. The throughput per host de-

creases with number of hosts connected to the LAN. See Problem 1.

Transparent bridging prevents this by interconnecting LAN segments (collision domains) and for-

wards unicast packets according to filtering database (FDB). Broadcast, multicast, and unknown

unicast are flooded to all LANs. So all segments form a single broadcast domain.

A bridge has two or more ports. Packets from incoming ports are forwarded to outgoing ports along

a spanning tree to prevent loops, according to FDB. See Figure 1.2.

spanning tree algorithm: one root, then shortest path to root;

learning process: produces FDB by relating MAC source address to incoming port and re-

moving unrefreshed entries.

Bridges exchange configuration messages to establish topology and topology-change messages to

indicate that STA should be rerun.

With a fixed number of bridge ports, througput per LAN segment decreases with the number of

segments in an extended LAN. See Problem 2.


4/82


Figure 1.3: LAN vs VLAN topology

Figure 1.4: VLAN tags

1.4 LAN switches IEEE 802.1Q

A LAN switch is a bridge with as many ports as number of LAN segments, and with enough capacity

to handle traffic on all segments. Problem 2 is solved through VLAN.

Virtual LANs or VLANs is a collection of LAN segments and attached devices with the properties

of an independent LAN. Each VLAN is a separate broadcast domain: traffic on one VLAN is

restricted from going to another VLAN. Traffic between VLANs goes through a router.

VLAN tags or VID (4-byte) are added to MAC frames so switches can forward packets to ports with

same VID. FDB is augmented to include for each VID the ports (the member set) through which

members of that VLAN can be reached.


5/82

1.4. LAN SWITCHES IEEE 802.1Q 5

The member set is derived from VLAN registration information: (i) explicitly by management

action or by (ii) GARP VLAN registration protocol (GVRP). GARP is generic attribute registrationprotocol.

Multicast filtering A VLAN is a single broadcast domain. If multicast messages are broadcast, the

througput is limited by the slowest link: A switch with 124 10-Mbps ports has a capacity of 1.24

Gbps but can transmit at most 6 1.5Mbps multicast video channels. GARP Multicast Registration

Protocol (GMRP) (IEEE 802.1P) allows switches to limit multicast traffic along the ST. (See IGMP.)

JOIN host sends this message to express interest in joining a multicast group. Switch adds

port to multicast group and forwards multicast source to these ports. JOIN messages are sent

once every JOINTIME timeout.

LEAVE message sent by host. Switch removes this port from multicast group unless anotherhost on that port sends JOIN message before LEAVETIME timeout.

LEAVEALL message peroidically sent by switch.

When a host sends IP data to a multicast (Class D IP) address, the host inserts the low order 23 bits

in the low order 23 bits of the MAC address. So a NIC that is not part of the group ignores these

data.


6/82


Figure 1.5: The IP header provides precedence and type of service fields

Quality of service The 3-bit precendence allows 8 priority levels. The ToS bits are D-min delay,

T-max throughput, R-max reliability, C-min cost.

802.1 provides no support for priority. 802.1P provides in-band QoS signalling with 8 COS levels. A

conforming bridge or switch maintains 8 queues. (VLAN tags may also carry priority information.)


7/82

1.5. PROBLEMS 7

1.5 Problems

1. The throughput per host decreases with number of hosts connected to the LAN. Formulate

two mathematical models, one deterministic and one stochastic, in which this quote is an

assertion. Then prove or disprove the assertion. You will have to model LAN speed, host

load and throughput.

Hint: Use M/M/1 model of section 3.3.

2. Follow Figure 1.2 and propose a graph model for an extended bridged LAN in which bridges

may have multiple ports.

(a) Use the graph to formulate two mathematical models, one deterministic and one stochas-

tic, within which one can determine the throughput per LAN segment.

(b) How would you formulate as a mathematical assertion the statement the througput per

LAN segment increases with the number of ports in the bridge?

Hint: try the Jackson network model of section 3.3.

3. Discuss the differences between STP and OSPF in terms of throughput or efficiency in link

utilization.


8/82



9/82

Chapter 2

Processor architecture

2.1 Datapaths

When packet arrives at bridge,

DA is searched in forwarding table (DA

output ports). If not found, packet is broadcast to

all output ports;

If found, it is forwarded across switching fabric to appropriate output port (or ports for mul-

ticast);

SA is learned and added to forwarding table;

During transfer to fabric packet may be stored or dropped if storage is full;

Packet is stored in output port queue (usually FIFO) and eventually transmitted.

When packet arrives at router,

DA is searched in forwarding table. If not found, packet is dropped;

If found, next-hop MAC address is appended, TTL is decremented, new Header Checksum is

calculated, and packet is forwarded across switching fabric to output port or ports;

During transfer to fabric packet may be stored: if storage is full, this (or another) packet may

be dropped;

Packet is stored in output queue (FIFO or more complex) and eventually transmitted.

When cell arrives at ATM switch,

Its VCI is searched in forwarding table (VC translation table: (VCI in, Port in)

(VCI out,

Port out)). If not found, cell is dropped;

9


10/82

10 CHAPTER 2. PROCESSOR ARCHITECTURE

If VCI is policed, policing function determines if cell is conformant. If not, it may be dropped.

If yes, cell is forwarded across switching fabric to output port;

During transfer, cell may be stored: if storage is full, this or another cell may be dropped;

Cell is stored in output queue and eventually transmitted. Service discipline may be FIFO or

very elaborate.


11/82

2.1. DATAPATHS 11

CPU

Memory

Line card #1

Line card #2

Line card #3

Line card #4

Line card #5

Line card #6

packet

CPU

Memory

Line card #1

Line card #2

Line card #3

CPU memory

packet

CPU memory

CPU memory

CPU

Memory

Line card #1

Line card #1

Line card #1

packet

CPU memory

CPU memory

CPU memory

CPU

Memory

Line card #1

Line card #2

Line card #3

packet

CPU memory

CPU memory

CPU memory

CPU

Memory

Line card #1

Line card #2

Line card #3

packet

CPU memory

CPU memory

CPU memory

A B

C D

Figure 2.1: Basic packet processor architecture

Throughput in A is limited by CPU speed;

In B, there is a choice about which CPU to forward packet;

In C, packet travels bus only once, so throughput limited by bus speed;

In D, several packets can be forwarded through crossbar.

General purpose CPUs are not well-suited for applications in which packets flow through. CPUs

are better when same data are examined several times, making use of cache.


12/82


Routing

Congestion control Reservation

SwitchingPolicing Scheduling

Control

path

Data pathper packet

processing

Forwarding

decision

Switching

fabricPolicing Scheduling

Figure 2.2: Elaboration of datapath functions

2.2 Performance

The packet delay through switch fabric consists of time (1) for forwarding decision, and (2) to

transfer packet across switch.

Packet delay through processor consists of time (1) for policing decision, (2) forwarding decision,

(3) to transfer across switch, and (4) for output scheduling decision.


13/82

2.3. FORWARDING DECISION 13

Header

arrival

time

Forwarding

decisiontime

Switch

transfer

time

packet size

min back-to-back

packet size

packet

arrival rate

Output

scheduling

decision time

Time

Figure 2.3: Delay of switch and packet processor

2.3 Forwarding decision

Criteria: (1) speed of address lookups depends on number of memory references; (2) size of memory

ATM switches perform direct lookup, figure 2.4

VCI address space is

= 16 M. Most switches contain

or fewer entries, since it is downstream

switch that chooses VCI that fits in supported address space (PNNI).

For multicast, lookup returns list of output ports, each with different VCI.

Address D

ataDRAM

VCI (port, new VCI)

Figure 2.4: ATM switches perform direct lookup


14/82


Network

address

Associated

data

net address

48 bits

associated

data

hit

location of

entry

log2N bits

(size N memory)

Figure 2.5: CAM or Content addressable memory. The 48-bit MAC address is presented. A suc-

cessful parallel search asserts hit signal and returns pointer to entry where forwarding information

for the MAC address is stored.

Bridge Address space is

so direct lookup is not possible. Three indirect lookup techniques:

Associative memory. Figure 2.5. Typical CAM size is

entries. Not suitable for large

LANs which support

entries.


15/82


Hashing

function

DRAMad

dress d

a

ta

48 bits 16 bits log2N

address ofN linked lists

M addresses

Figure 2.6: A 48-bit address is presented and the hashing function returns a pointer to one of

linked lists. The search through a linked list takes a random time proportional to length of list.

Hashing. For large LANs hashing is an option. Suppose the LAN has

hosts. A hashing function,

, maps a hosts 48-bit address to a forwarding table with, say,

entries as in Figure 2.6.

Two addresses

may collide:

. The entry points to a linked list of (MAC address,

forwarding data) of MAC addresses that map into the same entry. The list must be searched sequen-

tially to locate the MAC address. The duration of search is proportional to the length of the list.Suppose

maps the

MAC addresses

into the

linked lists

. Assume

that

are independent uniformly distributed over

.

The length of the

th list is the ramdom number

(2.1)

Let

. If

is small (number of lists larger than number of possible addresses), the lists

will usually have 0 or 1 element. Problem 3 asks to find the distribution of

. For

(

),

the mean length of the list is about

. However,

being random, there is a chance that

some lists (and corresponding search time) may be very large. For real-time applications, you maystore forwarding tables in such a way (e.g. as trees) that retrieval has a deterministic bound,


16/82


Prefix Outgoing port128.32.0.0 /16 1

128.32.239.0/24 7

128.32.239.3/32 3

Figure 2.7: Forwarding table with CIDR

IP routers. With CIDR, router forwarding table entries are identified by a pair, (route prefix/prefix

length), with prefix length between 0 and 32 bits. See Figure 2.7. The entry 128.32.0.0/16 is a

16-bit long entry.

The forwarding decision must find the longest prefix match between the packets destination IP

address and the prefixes in the forwarding table.

CIDR reduces table, but the forwarding decision is more complex. See [9].

With declining memory cost, it may be more economical to expand the prefixes and use simpler,

exact matching algorithms.


17/82


Caching. The forwarding decision delay can be reduced by caching. Idea is that the IP destination

addresses of successive packets are correlated.

The cache stores the full source and destination IP address and the corresponding forwarding deci-

sion (including perhaps the entire replacement IP header).

When packet arrives SA and DA are used to do a full match in the local cache. If the addresses are

not there, the packet is forwarded to a central routing processor. A cache replacement rule is needed

if there is a cache miss.

The improvement in delay depends on (1) the ratio of cache size to the size of the forwarding table,

and (2) the temporal locality. The latter is likely to be higher in a campus router than an edge router

and larger there than in a core router. See Problem 4.

Multicast. Some routers support multicast. The simplest rule is RPF (reverse-path forwarding): If a

multicast packet arrives on port

from source

, look up

in the forwarding table. If

is the best

port to reach

, forward the packet on all ports except

.

Switching fabrics Need some queuing models.


18/82


2.4 Problems

1. For a commercial LAN switch, find the various times in Figure 2.3. Also give the throughput.

See, for example, www.bcr.com/bcrmag/08/98p25.htm

2. If forwarding decision, switch transfer, and output scheduling can be pipelined, what is the

throughput of the processor?

3. Find the (marginal) distribution of the

given in (2.1), and calculate the mean length

of a list. Show that for

small, the mean is approximately

.

Find the joint distribution

. Verify that it has the product form:

Here

, so the denominator is the normalizing constant.

Take

. Find the probabililty that

.

Suppose a memory access takes 100 ns,

. Consider back-to-back Ethernet packets.

What is the average throughput of this switch using the model of Figure2.3 and ignoring the

output scheduling decision delay.

4. The packets arriving at a line card belong to several multiplexed TCP connections.

(a) Formulate a model of packet arrivals with say

simultaneous connections and in which

connections last a random amount of time with a geometric distribution and mean

.(b) Suppose the size of the cache is

. If there is a cache miss, an existing entry is replaced

by the missing entry. How would you calculate the hit ratio as a function of

?

(c) Suppose you are given a typical trace of the addresses of packet arrivals, but no model

of the arrival process. You want to know how big a cache you would need so that the hit

ratio is a certain value, say

. What would you do?

(d) The time to search a cache is

, the time to search the central forwarding table is

,

the hit ratio is

. How would you decide if its worth having a cache?


19/82

Chapter 3

Queuing

3.1 Discrete time Markov chains

is a Markov chain with

finite or countable, stationary probability matrix

, initial distribution

.

So

(3.1)

for all

.

is the marginal distribution of written as a row vector. From (3.1)

(3.2)

is invariant if it satisfies the balance equations

(3.3)

is irreducible if it goes from any state

to any other state

(with positive prob). Irreducible

chains have at most one invariant distribution. The chain is positive recurrent if it has one invariant

distribution.

If

is irreducible,

(3.4)

i.e.

is the fraction of time

spends in state

.

is aperiodic if

, where

gcd

If

,

is periodic with period

.

19


20/82


21/82


22/82


23/82

3.2. CONTINUOUS-TIME MARKOV CHAINS 23

xt

t

t1 t2 t3

S1S2

S3

Figure 3.2: A trajectory in of Theorem

Theorem (Markov property) For any set

of trajectories

Such

is of the form

,

,

. See Figure 3.2.


24/82

24 CHAPTER 3. QUEUING

is irreducible if

for all

if

is irreducible, where

Theorem Suppose

is c-t Markov chain with rate

and initial distribution

. Then

1.

is invariant,

iff balance equation

(3.6)

2.

has at most one invariant distribution

and then

3. If

has no invariant distribution,


25/82

3.2. CONTINUOUS-TIME MARKOV CHAINS 25

Theorem (Time reversal) Suppose

is stationary, c-t, Markov with rate

, distribution

. The

time-reversed process

is stationary, Markov, with distribution

and rate

where

Why?

and


26/82


0 1 2 3

0

1

2

3

xt

t

arrivals

departures

Figure 3.3: Diagrams for M/M/1 system. Arrivals (blue) and departures(red) form Poisson pro-

cesses.

3.3 M/M/1 model

See Figure 3.3. The balance equation (3.6) is

which has a (unique) solution iff

:

with

(3.7)


27/82

3.3. M/M/1 MODEL 27

The queue

is time-reversible, because

so the rate matrix of the time-reversed process,

, is the same as that of

.

So the departures before time

form a Poisson process with rate

, independent of

. Surprise!

The mean queue length is

For

, the mean is 10 packets.

Above,

av. number of exponential packet arrivals per sec

av. number of packets that can be transmitted per sec

av. utilization


28/82


A packet arriving at time

sees

packets in queue with

packet arrives in

packet arrives in

packet arrives in

so the average time between departure and arrival (including packet service or transmission time) is

Alternatively,

.

Example Consider a 10 Gbps link. Packet lengths are exponentially distributed with mean length

10,000 bits.1 So

packets/s and

s per packet.

Link utilization is 90 percent, i.e.

. Then the average number of packets in buffer is

. The average delay faced by a packet including its own service (transmission) time

is 10

s.

If the packet goes through 10 nodes the average delay is 100

s (assuming independence of nodes).

For a 100 Mbps link, with same packet length distribution,

,

s/packet, and the average delay is 1000

s per link.

The probability of 100 or more packets in buffer is

Compare queuing delay with propagation delay of

s/km = 15 ms for 3,000 km link.

Possible number of bits in the 3,000 km, 10 Gbps link is

.

1What is a more realistic distribution?


29/82

3.3. M/M/1 MODEL 29

Alternative formulation

is a Poisson counting process with rate

the arrival

process.

be a Poisson counting process with rate

the virtual service process.

are independent.

The queue at

is given by

The departure counting process is

,

is also Poisson. Moreover,

Future arrivals,

, and current state,

, are independent;

Past departures,

, and current state,

, are independent.


30/82


external traffic

rate is i pkt/sec

i

line rate is i pkt/sec

r(j,i)j

Switch

line i

external traffic

traffic

from

network

Figure 3.4: Parameters of Jackson network

Jackson network See Figure 3.4. Assumptions:

Independent, exponential service times with rate

;

Markovian routing

;

Poisson external arrivals at rate

packets/sec;

Aggregate arrivals into node

is

where

all

(3.8)

Let

be queue-length process. This is Markovian. Problem 5 asks to find its rate

matrix.


31/82

3.3. M/M/1 MODEL 31

Theorem Assume

, all

. Then

has an invariant distribution of the product form:

where

with

This is a surprising result. The departure from any node in the Jackson network need not be Poisson,

unlike the case of a single M/M/1 system.


32/82


0 1 2 3

2 3

m-1 m m+1

(m1) m m

route to first freeserver

Figure 3.5: The M/M/m/

system

3.4 Other M/M/m/n models

M/M/m, the m server case The received request is routed to the first of

available servers, Figure

3.5. The buffer is infinite. The balance equations are

This gives

(3.9)

It is assumed that

.

is obtained using

,

A packet arriving at time

sees all servers busy (

) with probability

packet arrives in

packet arrives in

packet arrives in

packet arrives in

from (3.9)

The expected number of packets waiting in queue (not in service) is


33/82

3.4. OTHER M/M/M/N MODELS 33

By Littles law (see below), the average waiting time in queue (not in service) is

and the total latency (waiting time) is


34/82


3.5 Littles law

Suppose

is the cumulative arrivals in

into a stable queueing system,

is number of

packets in system (including those in service). Let

be latency of packet

. Let

be arrival rate.

Suppose queue is empty at

and

. From figure 3.6, the time average of queue size is

Taking limits as

, and if time averages equal ensemble averages, we get

S1 S2

W2 W4S3 S5S4

W5W5

x(t)

t

A(t)

Figure 3.6: Calculations for Littles law


35/82

3.6. PASTA 35

3.6 PASTA

We have used the PASTA property (Poisson arrivals see time averages) several times.

Consider stationary queuing system with deterministic service time of 3 and periodic arrivals (period

10). A sample path with arrivals at 1,2,3,11,12,13,21,22,23,

and queue process

is shown in

figure 3.7.

Let

be the probability that

at any time

, and let

be the probabililty that an

arriving packet sees

packets in queue. For this system,

so the two probabilities are not the same.

1 2 3 4 5 6 7 1110

x(t)

Figure 3.7: PASTA property does not hold in this deterministic queuing system

Consider a M/G/1 system, with stationary probabilities

. Let

be the probability that an

arrival sees

packets in queue. Then,

packet arrives in

packet arrives in

packet arrives in

using Bayes rule, independence of arrivals after

from

, and independence of service

times.


36/82


S1 S3S4

S5W

Wt

W(t)

W3

area (2)

Figure 3.8: Deriving Pollaczek-Khinchin formula

3.7 Pollaczek-Khinchin formula

Consider M/G/1 system with independent service times

,

,

, Poisson arrivals

with rate

. Let

be the remaining waiting time, i.e. the amount of time needed to serve packets

in the system at

. Let

and

be the service time and waiting times of packet

, see figure 3.8.

The time average of waiting time

is the parallelogram area for packet

, so

. Substituting and taking

limits as

,

waiting time faced by arriving packet

By PASTA,

waiting time faced by arrival

. So,

where

is the utilization.

Note: The formula

involving a random sum of

terms is sometimes called Walds formula. A general version of

Walds formula is a consequence of the fact that

is a martingale. See Problem8.

Determinism minimizes waiting

In general,

, so

where the last expression is the waiting time for a deterministic service time (eg. ATM cells).


37/82


38/82


39/82

Chapter 4

Switching

4.1 Packet switching

Architectures

IQ/HOL

VoQ

SQ

39


40/82

40 CHAPTER 4. SWITCHING

is blocked

IQ: hol blocking OQ: faster switch

VoQ: matching SQ: reduces buffer size

Figure 4.1: Packet switch architectures

4.1.1 Architectures

Second generation PRIZMA architecture is , with 2 Gbps ports ... all on one chip [15]


41/82

4.1. PACKET SWITCHING 41

/N

1

3

1

1

2

2

1

2

3

1

11

1HOL queue

AtXt

Input from nonblocked queues

10

8

6

4

2

0

-

------------------

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1

Average delay in cell times

read HOL arrivals

xt

At xt+1

min(1, xt)

Xt

Figure 4.2: Virtual HOL queue

4.1.2 Input queues

Assume:

discrete time

, independent arrivals, uniform destination with prob

large so total number of port 1 arrivals is Poisson

port 1 arrivals

Virtual HOL queue of

port 1 packets at head of queue:

(4.1)

where,

number of new port 1 packets that come to head unblocked queues

Suppose equilibrium probability of unblocked queue is is

. Then

is Poisson with mean

. From (4.1)


42/82


so

. Square (4.1), and take expectations,

(4.2)

Since

,

Also

# blocked queues

For

this gives

,

That is 42 % switch bandwidth is not utilized.

quick upper bound Same switch. But at end of each cycle IQs are flushed. With

, switch

throughput is

Per port throughput is


43/82

4.1. PACKET SWITCHING 43

4.1.3 Virtual output queue

Each input port has

VoQs, one per output port.

If several input ports have packets for same destination, which one should be served.

Assume iid arrivals

with rates

such that

Service

such that

Note: If = above,

is a permutation over

.

Queue lengths

such that

Question: Given

, find

, based on past arrivals

and

, so that

is stable.

Conjecture: Always exists stabilizing matching

.


44/82


45/82


46/82


Shared memory

queue 1

queue N

input 1

input N

output 1

output N

path by packet from

input 1 to output N

shared bus

Figure 4.4: Switch with time-division shared bus and centralized shared memory

4.2 Shared queue

This architecture is used in most low speed packet processors: a time-division bus with a centralized

memory shared by all input and output lines, figure 4.4. Up to

packets may arrive at one time

and up to

may be read at one time, so memory bandwidth must be

-times line rate. Assume

100 ns DRAM access time, 53B-wide bus, gives total bandwidth of

Mbps. For a

16-port ATM switch, this gives line rate of Mbps.

Let

size of 1-list at beginning of slot

# of 1-packets arriving in slot

(4.5)

Following same argument that led to (4.2) gives

where

and

. For the Poisson case,

, so

(4.6)

Shared vs separate queue

Suppose shared buffer is sized at

where

is the standard deviation of

. Then

separate buffer size

shared buffer size


47/82

4.3. OUTPUT QUEUE 47

4.3 Output queue

In an output queued switch the switch fabric must run

-times, and the ouptut memory must run

-times line rate. The queue length in port 1 is given by (4.5).


48/82


4.4 Problems

1. Assume that

is Poisson in (4.1) or (4.5). The mean queue size is given by (4.6). Is

(a)

Markov? Why?

(b) If

is stationary, how would you find

?


49/82

Chapter 5

Matching

Crossbar switches need a controller to schedule a switch. The controller must find a good match,

eg. longest queue first, oldest cell first, etc.

It is too expensive to run a centralized matching algorithm with complexity

or

. (A

40-byte packet at a line speed of 1 Gbps amounts to 360 ns/packet.)

So one may have to be satisfied with maximal matching, using distributed algorithms. Note that for

a fully-connected bipartite graph, a maximal matching is also maximum.

In case of QoS, the matching must satisfy some preferences.

49


50/82


51/82

5.1. THE DATING GAME 51

The GSA algorithm. Say that a man or woman is

freeif she/he is not engaged or matched to any man/woman

engagedif she/he is temporarily matched to some man/woman

matchedif she/he is terminally matched


52/82

52 CHAPTER 5. MATCHING

BEGIN

all are free

Is

some man

m free?

m proposes to w,

the first woman he has not yet

proposed to

is

w free?

w is currently

engaged to m'

does w

prefer m to

m'?

match w and m,

set m' free

ENDNo

Yes

w engaged to myes

m continues freeno

Figure 5.1: The GS algorithm


53/82


54/82


1

2

3

4

1

2

3

4

1

2

3

4

1

2

3

4

1

2

3

4

1

2

3

4

a1

a3

g2

g4

1

2

3

4

1

2

3

4

Figure 5.2:

RRM showing

pointers with

.

5.2 Round-robin matching

Each input

maintains accept pointer

. Each output

maintains grant pointer

.

RRM cycle.

Step 1 Each

requests all

with

.

Step 2 Each

grants next requesting input

at or after current pointer value

, i.e.

then increments

.

Step 3 Each

accepts next granted output


, i.e.

If grant

has been accepted, increments

. Figure 5.2 illustrates one RRM cycle. Initially,

all

, and all

. The input requests are

So we have the following steps:

At the end of this cycle, the match is

, and the pointer values are given above.


55/82

5.2. ROUND-ROBIN MATCHING 55

5.2.1 Analysis of RRM

Under heavy load, the grant counters may get synchronized, reducing utitlization. Consider

all

. Then it is possible for

always as follows.

Match

Match

Match

Match

At the end of the fourth cycle the situation repeats. Throughput is 50 percent.

Of course the following TDM cycle is also possible, and has througput of 100 percent.

Match

Match

Under heavy load, if grant counters get syncronized at any time (i.e. have the same value), they ll

stay synchronized forever.

Under light load, the grant counters will be randomly distributed. The probability that some input

is not served is


56/82


1

2

1

2

(1,1) = 1

(1,2) = 1

(2,1) = 1

(1,1) =/

(1,2) =3/

(2,1) =3/

Figure 5.3: PIM can be unfair under heavy load

5.3 Partial iterative matching, PIM

Step 1 Each unmatched input

sends requests to every output

such that

.

Step 2 Each

randomly picks

from received requests.

Step 3 Each

randomly accepts one of received grants.

The I in PIM means that this cycle is repeated to improve match.

5.3.1 Analysis of PIM

It appears that with uniform iid traffic, PIM achieves maximal match in 3 iterations.

In heavy load, every input makes requests. Probability that

receives no grant in one round equals

PIM can be unfair. Figure 5.3 gives a

case where the request rates from input

to output

is

. So requests

are made in each slot. The grant rates from output

to

input

will therefore be

. So input will accept output 1 with

probability

, and output 2 with probability

; input 2 will accept output

1 with probability

.

Thus even though arrival rates for output port 1 are equal at inputs ports 1 and 2, the acceptance

rates are not the same.


57/82

5.4. ISLIP MATCHING 57

5.4 iSLIP matching

The detailed reference is [4]. The RRM suffers from synchronization of the grant counters. The

iSLIP modifies RRM slightly so that the grant counters are incremented only if the grant is accepted.

So step 2 of RRM is modified.

Step 2 Each

grants next requesting input


, i.e.

then increments

only if

accepts output

.


58/82


59/82


5.4.2 Priority iSLIP

Suppose there are

priority levels. Then each input

maintains

VoQs, with

the

buffer occupancy of priority

and output

. Then

gives strict priority, i.e. serves

only

if

,

. Each input maintains counter

and each output maintains

for each

priority level.

Step 1 Each

selects highest priority level

with non-empty queue to output

.

Step 2 Output

determines highest priority level

. The output then chooses one

input among those inputs that have requested at level

. The output maintains separate pointer

, and chooses input

among requests at level

in the same round-robin scheme. The

output notifies each input whether or not its request is granted. The pointer

is

incremented only if granted input

accepts output

.

Step 3 If input

receives any grants, it determines the highest priority level grant, say

. The

input then chooses one grant among the requests granted at this level. This is done according to the

counter

, which is incremented

. The input then notifies each output whether or

not its grant was accepted.


60/82


5.4.3 Threshold iSLIP

It may be better to select a weighted maximal match with weights corresponding to queue length.

If queue lengths are quantized in threshold levels

, then priorities may be assigned

accordingly as

.

5.4.4 Weighted iSLIP

Suppose bandwidth from

to

is to be shared according to the ratio

subject

to

,

.

In iSLIP each counter is an ordered circular list

. Now expand the list at output

to

where

is the lcd of

and input

appears

times in the list.


61/82


stateofinputqueues(N2

bits)

Grant

arbiters

Accept

arbiters

Decision

register

1

2

N

1

2

N

Figure 5.4: Interconnection of the input and output arbiters to construct the iSLIP scheduler

5.4.5 Implementation

Figure 5.4 shows how the iSLIP scheduler for a

switch is constructed from the input and

output arbiters.

The state memory records whether an input queue is empty. From this memory, an

-

bit wide vector presents

bits to each of the

output grant arbiters representing Step 1

(request).

The grant arbiters select a single input among the contending requests to implement Step 2

(grant).

The grant decisions are presented to the

accept arbiters, each of which selects at most one

output on behalf of each input to implement Step 3 (accept).

The final decision is stored in the decision registers and the value of the

and

pointers

are updated. The decision register is used to notify each input which cell to transmit and to

configure the crossbar switch.


62/82



63/82

Chapter 6

Network processors

Figure 6.1 is a logical diagram of how a network processor (NP) fits in a system design. The

NP is located between the physical layer (MAC or framer) and the switch fabric. In the figure

the Serializer/Deserializer (SERDES) is the interface between the NPU and switch fabric. The

framer or MAC presents a packet to the NPU which must examine it, parse it, do necessary edits

and database lookups to enforce various policies at layers 3-7 (forwarding, queuing, labels), and

exchange messages with switch controller. The NP is in the data path.

63


64/82

64 CHAPTER 6. NETWORK PROCESSORS

Figure 6.1: Location of NP in a logical diagram. Source [17].

6.0.6 NP operation

Figure 6.2 shows a generic block diagram. Data of multiple physical interfaces or the switch fabric

are transferred to/from the NP. The bitstream processors receive the serial stream of packet data and

extract the information needed to process the data, such as MAC or IP source/destination address,TOS bits, TCP port numbers, MPLS or VLAN tags. The packet is then written into the packet

buffer memory. This information is fed to the processor complexthe programmable unit of the

NP. Under program control, the processor may extract additional information from the packet and

submits relevant information to the search engine which looks up the MAC or IP address, classifies

the packet, or does a VCI/VPI lookup using the routing/bridging tables. Upon packet transmission

through the bitstream processor, the necessary modifications to the packet header are performed.


65/82

65

packet

buffer

memory

routing

and

bridging

tables

buffer

manager/scheduler

general

purposeCPU

processor

complex

search engine HW assistsbitsteream

processors

To/from PHY/

switch fabric

Figure 6.2: Generic NP architecture. Source [15].

Figure 6.3: Time to process 40B packets at different line rates. Source [17]

6.0.7 Speed of operations

Table 6.3 shows the time available to process back to back 40B packets at different line speeds. At

1 Gbps, the time to process one packet is 360 ns. Using 10-ns SRAM permits a maximum of 36

memory accesses. Thus faster line rates can be accommodated only by processing several packets

simultaneously in a pipelined or parallel fashion.


66/82

66 CHAPTER 6. NETWORK PROCESSORS

6.0.8 Packet buffer memory

For the architecture offigure 6.2, each packet header byte may traverse the memory interface at

least four times:

write inbound packet

read header into processor complex

write back to memory

read for outbound transmission

So for 40 byte back-to-back packets the required memory interface capacity is 10-120 Gbps for linerates of 2.5-40 Gbps.


67/82


68/82

68 CHAPTER 7. DISTRIBUTED SWITCH

1

N

1

M

cap = 1

Figure 7.1: A distributed switch is a network of switches with certain number of input and output

ports,

input nodes and

output nodes

7.2 Clos network

This is a 3-stage network as illustrated in Figure 7.2. The Clos network is specified by 5 numbers

IN

OUT. There are

switches arranged in 3 stages. The number of

input-output ports and connectivity of the switches are as shown.

Theorem A Clos network with RNB switch modules is RNB iff

IN

OUT

A Clos network with SNB switch modules is SNB iff

IN

OUT

The total number of input lines is IN

. The total number of output lines is OUT

.

The Clos network in the figure is SNB. It has 9 input lines and 8 output lines.


69/82


70/82


71/82

7.3. RECURSIVE CONSTRUCTION 71

N = p x q

q x q

N = p x q

q planesq planes

p x p p x p

1

q

p

1

p

1

q

p

Figure 7.4: Recursive construction of a RNB CLos network

Figure 7.4 is a

RNB switch if each module is RNB.


72/82


N N

N/2 N/2

N/2 N/2

2 log2N 1 stages ofN /2 2 2 switches

2 2

Figure 7.5: The Benes switch

Figure 7.5 is a

RNB switch made up of

switch modules.


73/82


74/82


Figure 7.7 illustrates an algorithm to rearrange existing connections in order to accommodate a new

connection.

Question 1: Can you supply a proof?

Question 2: Is there an alogrithm to accommodate new connections in an arbitrary network of Figure

7.1?


75/82


12

3 4

5

Figure 7.7: Algorithm to add a new connection for a RNB switch


76/82


In a Benes switch, feasible flows may require multiple paths. Figure7.8 and 7.9 show this. Note:


77/82


1234

1 2 3 4

e

e

1-2e

1-e

1-e

e e

1

e

1 1-e

1-e

e

1 1-ee

1

1-2e

2e

1-e

e

4

1

4

1

3

e

1-e

1 1-e

e

e

1

1-2e

2e

1-e

1-e

e

1-2e

e2

Figure 7.8: Split flow 1


78/82


1

1

e

e

1-e1

1-2e

1-e

e

e

1-e

2e

1

1

e

e

1-e

1-2e

1-e

4

2

234

1 2 3 4

e

e

1-2e

1-e

1-e

e e

1

3

1

1

e

e

1-e e

1-2e1-e

1

1-2e

1-e

e

e

1-ee

1-e

2e

1

Figure 7.9: Split flow 2


79/82


1

1

1

1 1

2

23

3

Figure 7.10: Max flow for single commodity is 3 and flows are integers; in multi-commodity case,

max flows are 0.5 and non-integer

In a Clos switch, permutations can be achieved without splittling flows. In a general multi-commodity

case this is not so. Figure 7.10 shows that if this is a single commodity problem, the maximum flow

is 3 and all flows are 1 (integer).

However, if the flows are

, the max flows are 0.5 each, and not integer.


80/82


1

1

1

1 1

2

23

3

1

1

1

1 1

2

23

3

0.5

0.5

Figure 7.11: Two copies offigure 7.10 are connected in parallel. Achieving flows of 1,1,1 requiressplittling

Figure 7.11 shows that a feasible permutation may require splittling flows. The green and cyan

flows must be connected in parallel similarly to the red flow.


81/82

Bibliography

[1] J. Walrand and P. Varaiya. Chapter 12, Switching. High performance communication networks

2nd edition, 2000.

[2] M.J. Karol, M. Hluchyj and S. Morgan. Input vs output queueing on a space-division packet

swtich. IEEE Trans Comm, COM-35(12): 1347-56, Dec. 1987.

[3] T.E. Anderson, S. Owicki, J. Saxe and C.P. Thacker. High-speed scheduling for local area

networks. ACM Trans Computer Systems, 11(4):319-52, Nov. 1993.

[4] N. McKeown. iSLIP: a scheduling algorithm for input-queued switches. IEEE Trans Network-

ing, 7(2), April 1999.

[5] N. McKeown, V. Anatharam and J. Walrand. Achieving 100% througput in an input-queued

switch. Proc. Infocom 96, vol 1: 296-302.

[6] B. Prabhakar and N. McKeown. On the speedup required for combined input and output

queued switching. Automatica, 35(12), Dec. 1999

[7] J.F. Hayes, R. Breault and M.K. Mehmet-Ali. Performance analysis of a multicast switch.

IEEE Trans Comm, COM-39(4): 581-87, April. 1991.

[8] B. Prabhakar, N. McKeown and R. Ahuja. Multicast scheduling for input-queued switches. J.

Selected Areas in Comm 15(5):855-66, June 1997.

[9] M. Waldvogel, G. Varghese, J. Turner and B. Plattner. Scalable high speed IP routing lookups.

ACM Sigcomm 97September 1997.

[10] A. Demers, S. Keshav and S. Shenker. Analysis and simulation of a fair queueing algorithm.

ACM Sigcomm 89 Computer Communication Review, 19(4): 1-12, 1989.

[11] A. Parekh and R. Gallager. A generalized processor sharing approach to flow control of inte-

grated services networks: the single node case. IEEE Trans Networking, 1(3): 344-57, June

1993.

[12] A. Parekh and R. Gallager. A generalized processor sharing approach to flow control of inte-

grated services networks: the multiple node case. IEEE Trans Networking, 2(2): 137-50, April

1994.

81


82/82

82 BIBLIOGRAPHY

[13] S. Floyd and V. Jacobsen. Random early detection. IEEE Trans Networking, 1(4): 397-413,

August 1993.

[14] I. Stoica, S. Shenker and H. Zhang. Core-stateless fair queuing: achieving approximately fair

bandwidth allocations in high speed networks. ACM Sigcomm 98, 1998.

[15] W. Bux, W.E. Denzel, T. Engbersen, et al. Technologies and building blocks for fast packet

forwarding. IEEE Communications Magazine, 39(1): 70-77, January 2001.

[16] P.R. Kumar and S. Meyn. Stability of queuing networks and scheduling policies. IEEE Trans.

Automatic Control, 40(2), February 1995.

[17] A. Deb. Building a network-processor based system. Integrated Communications Design,

December 2000. Available at www.icdmag.com.

Bridges Routers

Documents

Transcript of Bridges Routers