Bridges Routers

download Bridges Routers

of 82

Transcript of Bridges Routers

  • 8/4/2019 Bridges Routers

    1/82

    Chapter 1

    Bridges, Switches, Routers

    1.1 Introduction

    Packet vs circuit (and virtual circuit) circuit switching

    Networkmesh interconnection of links and switches

    LANs (multiaccess, broadcast or shared medium Ethernet: 10BT1000BT, Cat 3 UTP)

    WANs switches connected by point to point links

    Packet processorsBridges, Routers, ATM switches

    1

  • 8/4/2019 Bridges Routers

    2/82

    2 CHAPTER 1. BRIDGES, SWITCHES, ROUTERS

    Routing

    Congestion control Reservation

    SwitchingPolicing Scheduling

    Controlpath

    Data pathper packet

    processing

    Figure 1.1: Packet processor functions may involve the data path or control path.

    1.2 Packet processor functions

    Routing creating and distributing information that defines path between source and destination

    and determining the best path

    Switching per-packet forwarding decisions, and sending packet towards destination

    Other functions congestion control, reservations, policing, scheduling

    Control functions performed infrequently; datapath functions are performed per packet.

  • 8/4/2019 Bridges Routers

    3/82

    1.3. TRANSPARENT BRIDGING IEEE 802.1D 3

    L1

    L2L3

    L4 L5

    B1 B2B3

    B4 B4

    L1

    L2

    L3

    L4 L5

    B1 B2B3

    B4

    10

    10

    10

    10

    20

    2020

    20

    10

    R

    R RD

    D

    D D

    D = dedicated port for LANR = root port for bridge

    D

    1. Determine root bridge, and set its ports

    in forwarding mode.

    2. Each bridge deterimines root port, and

    sets it in forwarding mode.

    3. Bridges determine designated port for

    each LAN segment.

    4. All other ports are in blocked state.

    STP

    Figure 1.2: Bridged extended LAN and corresponding graph. Bridge forwards frames along span-

    ning tree, according to FDB.

    1.3 Transparent bridging IEEE 802.1D

    Ethernet LANs broadcast each packet to every device on the LAN. The throughput per host de-

    creases with number of hosts connected to the LAN. See Problem 1.

    Transparent bridging prevents this by interconnecting LAN segments (collision domains) and for-

    wards unicast packets according to filtering database (FDB). Broadcast, multicast, and unknown

    unicast are flooded to all LANs. So all segments form a single broadcast domain.

    A bridge has two or more ports. Packets from incoming ports are forwarded to outgoing ports along

    a spanning tree to prevent loops, according to FDB. See Figure 1.2.

    spanning tree algorithm: one root, then shortest path to root;

    learning process: produces FDB by relating MAC source address to incoming port and re-

    moving unrefreshed entries.

    Bridges exchange configuration messages to establish topology and topology-change messages to

    indicate that STA should be rerun.

    With a fixed number of bridge ports, througput per LAN segment decreases with the number of

    segments in an extended LAN. See Problem 2.

  • 8/4/2019 Bridges Routers

    4/82

    4 CHAPTER 1. BRIDGES, SWITCHES, ROUTERS

    Figure 1.3: LAN vs VLAN topology

    Figure 1.4: VLAN tags

    1.4 LAN switches IEEE 802.1Q

    A LAN switch is a bridge with as many ports as number of LAN segments, and with enough capacity

    to handle traffic on all segments. Problem 2 is solved through VLAN.

    Virtual LANs or VLANs is a collection of LAN segments and attached devices with the properties

    of an independent LAN. Each VLAN is a separate broadcast domain: traffic on one VLAN is

    restricted from going to another VLAN. Traffic between VLANs goes through a router.

    VLAN tags or VID (4-byte) are added to MAC frames so switches can forward packets to ports with

    same VID. FDB is augmented to include for each VID the ports (the member set) through which

    members of that VLAN can be reached.

  • 8/4/2019 Bridges Routers

    5/82

    1.4. LAN SWITCHES IEEE 802.1Q 5

    The member set is derived from VLAN registration information: (i) explicitly by management

    action or by (ii) GARP VLAN registration protocol (GVRP). GARP is generic attribute registrationprotocol.

    Multicast filtering A VLAN is a single broadcast domain. If multicast messages are broadcast, the

    througput is limited by the slowest link: A switch with 124 10-Mbps ports has a capacity of 1.24

    Gbps but can transmit at most 6 1.5Mbps multicast video channels. GARP Multicast Registration

    Protocol (GMRP) (IEEE 802.1P) allows switches to limit multicast traffic along the ST. (See IGMP.)

    JOIN host sends this message to express interest in joining a multicast group. Switch adds

    port to multicast group and forwards multicast source to these ports. JOIN messages are sent

    once every JOINTIME timeout.

    LEAVE message sent by host. Switch removes this port from multicast group unless anotherhost on that port sends JOIN message before LEAVETIME timeout.

    LEAVEALL message peroidically sent by switch.

    When a host sends IP data to a multicast (Class D IP) address, the host inserts the low order 23 bits

    in the low order 23 bits of the MAC address. So a NIC that is not part of the group ignores these

    data.

  • 8/4/2019 Bridges Routers

    6/82

    6 CHAPTER 1. BRIDGES, SWITCHES, ROUTERS

    Figure 1.5: The IP header provides precedence and type of service fields

    Quality of service The 3-bit precendence allows 8 priority levels. The ToS bits are D-min delay,

    T-max throughput, R-max reliability, C-min cost.

    802.1 provides no support for priority. 802.1P provides in-band QoS signalling with 8 COS levels. A

    conforming bridge or switch maintains 8 queues. (VLAN tags may also carry priority information.)

  • 8/4/2019 Bridges Routers

    7/82

    1.5. PROBLEMS 7

    1.5 Problems

    1. The throughput per host decreases with number of hosts connected to the LAN. Formulate

    two mathematical models, one deterministic and one stochastic, in which this quote is an

    assertion. Then prove or disprove the assertion. You will have to model LAN speed, host

    load and throughput.

    Hint: Use M/M/1 model of section 3.3.

    2. Follow Figure 1.2 and propose a graph model for an extended bridged LAN in which bridges

    may have multiple ports.

    (a) Use the graph to formulate two mathematical models, one deterministic and one stochas-

    tic, within which one can determine the throughput per LAN segment.

    (b) How would you formulate as a mathematical assertion the statement the througput per

    LAN segment increases with the number of ports in the bridge?

    Hint: try the Jackson network model of section 3.3.

    3. Discuss the differences between STP and OSPF in terms of throughput or efficiency in link

    utilization.

  • 8/4/2019 Bridges Routers

    8/82

    8 CHAPTER 1. BRIDGES, SWITCHES, ROUTERS

  • 8/4/2019 Bridges Routers

    9/82

    Chapter 2

    Processor architecture

    2.1 Datapaths

    When packet arrives at bridge,

    DA is searched in forwarding table (DA

    output ports). If not found, packet is broadcast to

    all output ports;

    If found, it is forwarded across switching fabric to appropriate output port (or ports for mul-

    ticast);

    SA is learned and added to forwarding table;

    During transfer to fabric packet may be stored or dropped if storage is full;

    Packet is stored in output port queue (usually FIFO) and eventually transmitted.

    When packet arrives at router,

    DA is searched in forwarding table. If not found, packet is dropped;

    If found, next-hop MAC address is appended, TTL is decremented, new Header Checksum is

    calculated, and packet is forwarded across switching fabric to output port or ports;

    During transfer to fabric packet may be stored: if storage is full, this (or another) packet may

    be dropped;

    Packet is stored in output queue (FIFO or more complex) and eventually transmitted.

    When cell arrives at ATM switch,

    Its VCI is searched in forwarding table (VC translation table: (VCI in, Port in)

    (VCI out,

    Port out)). If not found, cell is dropped;

    9

  • 8/4/2019 Bridges Routers

    10/82

    10 CHAPTER 2. PROCESSOR ARCHITECTURE

    If VCI is policed, policing function determines if cell is conformant. If not, it may be dropped.

    If yes, cell is forwarded across switching fabric to output port;

    During transfer, cell may be stored: if storage is full, this or another cell may be dropped;

    Cell is stored in output queue and eventually transmitted. Service discipline may be FIFO or

    very elaborate.

  • 8/4/2019 Bridges Routers

    11/82

    2.1. DATAPATHS 11

    CPU

    Memory

    Line card #1

    Line card #2

    Line card #3

    Line card #4

    Line card #5

    Line card #6

    packet

    CPU

    Memory

    Line card #1

    Line card #2

    Line card #3

    CPU memory

    packet

    CPU memory

    CPU memory

    CPU

    Memory

    Line card #1

    Line card #1

    Line card #1

    packet

    CPU memory

    CPU memory

    CPU memory

    CPU

    Memory

    Line card #1

    Line card #2

    Line card #3

    packet

    CPU memory

    CPU memory

    CPU memory

    CPU

    Memory

    Line card #1

    Line card #2

    Line card #3

    packet

    CPU memory

    CPU memory

    CPU memory

    A B

    C D

    Figure 2.1: Basic packet processor architecture

    Throughput in A is limited by CPU speed;

    In B, there is a choice about which CPU to forward packet;

    In C, packet travels bus only once, so throughput limited by bus speed;

    In D, several packets can be forwarded through crossbar.

    General purpose CPUs are not well-suited for applications in which packets flow through. CPUs

    are better when same data are examined several times, making use of cache.

  • 8/4/2019 Bridges Routers

    12/82

    12 CHAPTER 2. PROCESSOR ARCHITECTURE

    Routing

    Congestion control Reservation

    SwitchingPolicing Scheduling

    Control

    path

    Data pathper packet

    processing

    Forwarding

    decision

    Switching

    fabricPolicing Scheduling

    Figure 2.2: Elaboration of datapath functions

    2.2 Performance

    The packet delay through switch fabric consists of time (1) for forwarding decision, and (2) to

    transfer packet across switch.

    Packet delay through processor consists of time (1) for policing decision, (2) forwarding decision,

    (3) to transfer across switch, and (4) for output scheduling decision.

  • 8/4/2019 Bridges Routers

    13/82

    2.3. FORWARDING DECISION 13

    Header

    arrival

    time

    Forwarding

    decisiontime

    Switch

    transfer

    time

    packet size

    min back-to-back

    packet size

    packet

    arrival rate

    Output

    scheduling

    decision time

    Time

    Figure 2.3: Delay of switch and packet processor

    2.3 Forwarding decision

    Criteria: (1) speed of address lookups depends on number of memory references; (2) size of memory

    ATM switches perform direct lookup, figure 2.4

    VCI address space is

    = 16 M. Most switches contain

    or fewer entries, since it is downstream

    switch that chooses VCI that fits in supported address space (PNNI).

    For multicast, lookup returns list of output ports, each with different VCI.

    Address D

    ataDRAM

    VCI (port, new VCI)

    Figure 2.4: ATM switches perform direct lookup

  • 8/4/2019 Bridges Routers

    14/82

    14 CHAPTER 2. PROCESSOR ARCHITECTURE

    Network

    address

    Associated

    data

    net address

    48 bits

    associated

    data

    hit

    location of

    entry

    log2N bits

    (size N memory)

    Figure 2.5: CAM or Content addressable memory. The 48-bit MAC address is presented. A suc-

    cessful parallel search asserts hit signal and returns pointer to entry where forwarding information

    for the MAC address is stored.

    Bridge Address space is

    so direct lookup is not possible. Three indirect lookup techniques:

    Associative memory. Figure 2.5. Typical CAM size is

    entries. Not suitable for large

    LANs which support

    entries.

  • 8/4/2019 Bridges Routers

    15/82

    2.3. FORWARDING DECISION 15

    Hashing

    function

    DRAMad

    dress d

    a

    ta

    48 bits 16 bits log2N

    address ofN linked lists

    M addresses

    Figure 2.6: A 48-bit address is presented and the hashing function returns a pointer to one of

    linked lists. The search through a linked list takes a random time proportional to length of list.

    Hashing. For large LANs hashing is an option. Suppose the LAN has

    hosts. A hashing function,

    , maps a hosts 48-bit address to a forwarding table with, say,

    entries as in Figure 2.6.

    Two addresses

    may collide:

    . The entry points to a linked list of (MAC address,

    forwarding data) of MAC addresses that map into the same entry. The list must be searched sequen-

    tially to locate the MAC address. The duration of search is proportional to the length of the list.Suppose

    maps the

    MAC addresses

    into the

    linked lists

    . Assume

    that

    are independent uniformly distributed over

    .

    The length of the

    th list is the ramdom number

    (2.1)

    Let

    . If

    is small (number of lists larger than number of possible addresses), the lists

    will usually have 0 or 1 element. Problem 3 asks to find the distribution of

    . For

    (

    ),

    the mean length of the list is about

    . However,

    being random, there is a chance that

    some lists (and corresponding search time) may be very large. For real-time applications, you maystore forwarding tables in such a way (e.g. as trees) that retrieval has a deterministic bound,

  • 8/4/2019 Bridges Routers

    16/82

    16 CHAPTER 2. PROCESSOR ARCHITECTURE

    Prefix Outgoing port128.32.0.0 /16 1

    128.32.239.0/24 7

    128.32.239.3/32 3

    Figure 2.7: Forwarding table with CIDR

    IP routers. With CIDR, router forwarding table entries are identified by a pair, (route prefix/prefix

    length), with prefix length between 0 and 32 bits. See Figure 2.7. The entry 128.32.0.0/16 is a

    16-bit long entry.

    The forwarding decision must find the longest prefix match between the packets destination IP

    address and the prefixes in the forwarding table.

    CIDR reduces table, but the forwarding decision is more complex. See [9].

    With declining memory cost, it may be more economical to expand the prefixes and use simpler,

    exact matching algorithms.

  • 8/4/2019 Bridges Routers

    17/82

    2.3. FORWARDING DECISION 17

    Caching. The forwarding decision delay can be reduced by caching. Idea is that the IP destination

    addresses of successive packets are correlated.

    The cache stores the full source and destination IP address and the corresponding forwarding deci-

    sion (including perhaps the entire replacement IP header).

    When packet arrives SA and DA are used to do a full match in the local cache. If the addresses are

    not there, the packet is forwarded to a central routing processor. A cache replacement rule is needed

    if there is a cache miss.

    The improvement in delay depends on (1) the ratio of cache size to the size of the forwarding table,

    and (2) the temporal locality. The latter is likely to be higher in a campus router than an edge router

    and larger there than in a core router. See Problem 4.

    Multicast. Some routers support multicast. The simplest rule is RPF (reverse-path forwarding): If a

    multicast packet arrives on port

    from source

    , look up

    in the forwarding table. If

    is the best

    port to reach

    , forward the packet on all ports except

    .

    Switching fabrics Need some queuing models.

  • 8/4/2019 Bridges Routers

    18/82

    18 CHAPTER 2. PROCESSOR ARCHITECTURE

    2.4 Problems

    1. For a commercial LAN switch, find the various times in Figure 2.3. Also give the throughput.

    See, for example, www.bcr.com/bcrmag/08/98p25.htm

    2. If forwarding decision, switch transfer, and output scheduling can be pipelined, what is the

    throughput of the processor?

    3. Find the (marginal) distribution of the

    given in (2.1), and calculate the mean length

    of a list. Show that for

    small, the mean is approximately

    .

    Find the joint distribution

    . Verify that it has the product form:

    Here

    , so the denominator is the normalizing constant.

    Take

    . Find the probabililty that

    .

    Suppose a memory access takes 100 ns,

    . Consider back-to-back Ethernet packets.

    What is the average throughput of this switch using the model of Figure2.3 and ignoring the

    output scheduling decision delay.

    4. The packets arriving at a line card belong to several multiplexed TCP connections.

    (a) Formulate a model of packet arrivals with say

    simultaneous connections and in which

    connections last a random amount of time with a geometric distribution and mean

    .(b) Suppose the size of the cache is

    . If there is a cache miss, an existing entry is replaced

    by the missing entry. How would you calculate the hit ratio as a function of

    ?

    (c) Suppose you are given a typical trace of the addresses of packet arrivals, but no model

    of the arrival process. You want to know how big a cache you would need so that the hit

    ratio is a certain value, say

    . What would you do?

    (d) The time to search a cache is

    , the time to search the central forwarding table is

    ,

    the hit ratio is

    . How would you decide if its worth having a cache?

  • 8/4/2019 Bridges Routers

    19/82

    Chapter 3

    Queuing

    3.1 Discrete time Markov chains

    is a Markov chain with

    finite or countable, stationary probability matrix

    , initial distribution

    .

    So

    (3.1)

    for all

    .

    is the marginal distribution of written as a row vector. From (3.1)

    (3.2)

    is invariant if it satisfies the balance equations

    (3.3)

    is irreducible if it goes from any state

    to any other state

    (with positive prob). Irreducible

    chains have at most one invariant distribution. The chain is positive recurrent if it has one invariant

    distribution.

    If

    is irreducible,

    (3.4)

    i.e.

    is the fraction of time

    spends in state

    .

    is aperiodic if

    , where

    gcd

    If

    ,

    is periodic with period

    .

    19

  • 8/4/2019 Bridges Routers

    20/82

  • 8/4/2019 Bridges Routers

    21/82

  • 8/4/2019 Bridges Routers

    22/82

  • 8/4/2019 Bridges Routers

    23/82

    3.2. CONTINUOUS-TIME MARKOV CHAINS 23

    xt

    t

    t1 t2 t3

    S1S2

    S3

    Figure 3.2: A trajectory in of Theorem

    Theorem (Markov property) For any set

    of trajectories

    Such

    is of the form

    ,

    ,

    . See Figure 3.2.

  • 8/4/2019 Bridges Routers

    24/82

    24 CHAPTER 3. QUEUING

    is irreducible if

    for all

    if

    is irreducible, where

    Theorem Suppose

    is c-t Markov chain with rate

    and initial distribution

    . Then

    1.

    is invariant,

    iff balance equation

    (3.6)

    2.

    has at most one invariant distribution

    and then

    3. If

    has no invariant distribution,

  • 8/4/2019 Bridges Routers

    25/82

    3.2. CONTINUOUS-TIME MARKOV CHAINS 25

    Theorem (Time reversal) Suppose

    is stationary, c-t, Markov with rate

    , distribution

    . The

    time-reversed process

    is stationary, Markov, with distribution

    and rate

    where

    Why?

    and

  • 8/4/2019 Bridges Routers

    26/82

    26 CHAPTER 3. QUEUING

    0 1 2 3

    0

    1

    2

    3

    xt

    t

    arrivals

    departures

    Figure 3.3: Diagrams for M/M/1 system. Arrivals (blue) and departures(red) form Poisson pro-

    cesses.

    3.3 M/M/1 model

    See Figure 3.3. The balance equation (3.6) is

    which has a (unique) solution iff

    :

    with

    (3.7)

  • 8/4/2019 Bridges Routers

    27/82

    3.3. M/M/1 MODEL 27

    The queue

    is time-reversible, because

    so the rate matrix of the time-reversed process,

    , is the same as that of

    .

    So the departures before time

    form a Poisson process with rate

    , independent of

    . Surprise!

    The mean queue length is

    For

    , the mean is 10 packets.

    Above,

    av. number of exponential packet arrivals per sec

    av. number of packets that can be transmitted per sec

    av. utilization

  • 8/4/2019 Bridges Routers

    28/82

    28 CHAPTER 3. QUEUING

    A packet arriving at time

    sees

    packets in queue with

    packet arrives in

    packet arrives in

    packet arrives in

    so the average time between departure and arrival (including packet service or transmission time) is

    Alternatively,

    .

    Example Consider a 10 Gbps link. Packet lengths are exponentially distributed with mean length

    10,000 bits.1 So

    packets/s and

    s per packet.

    Link utilization is 90 percent, i.e.

    . Then the average number of packets in buffer is

    . The average delay faced by a packet including its own service (transmission) time

    is 10

    s.

    If the packet goes through 10 nodes the average delay is 100

    s (assuming independence of nodes).

    For a 100 Mbps link, with same packet length distribution,

    ,

    s/packet, and the average delay is 1000

    s per link.

    The probability of 100 or more packets in buffer is

    Compare queuing delay with propagation delay of

    s/km = 15 ms for 3,000 km link.

    Possible number of bits in the 3,000 km, 10 Gbps link is

    .

    1What is a more realistic distribution?

  • 8/4/2019 Bridges Routers

    29/82

    3.3. M/M/1 MODEL 29

    Alternative formulation

    is a Poisson counting process with rate

    the arrival

    process.

    be a Poisson counting process with rate

    the virtual service process.

    are independent.

    The queue at

    is given by

    The departure counting process is

    ,

    is also Poisson. Moreover,

    Future arrivals,

    , and current state,

    , are independent;

    Past departures,

    , and current state,

    , are independent.

  • 8/4/2019 Bridges Routers

    30/82

    30 CHAPTER 3. QUEUING

    external traffic

    rate is i pkt/sec

    i

    line rate is i pkt/sec

    r(j,i)j

    Switch

    line i

    external traffic

    traffic

    from

    network

    Figure 3.4: Parameters of Jackson network

    Jackson network See Figure 3.4. Assumptions:

    Independent, exponential service times with rate

    ;

    Markovian routing

    ;

    Poisson external arrivals at rate

    packets/sec;

    Aggregate arrivals into node

    is

    where

    all

    (3.8)

    Let

    be queue-length process. This is Markovian. Problem 5 asks to find its rate

    matrix.

  • 8/4/2019 Bridges Routers

    31/82

    3.3. M/M/1 MODEL 31

    Theorem Assume

    , all

    . Then

    has an invariant distribution of the product form:

    where

    with

    This is a surprising result. The departure from any node in the Jackson network need not be Poisson,

    unlike the case of a single M/M/1 system.

  • 8/4/2019 Bridges Routers

    32/82

    32 CHAPTER 3. QUEUING

    0 1 2 3

    2 3

    m-1 m m+1

    (m1) m m

    route to first freeserver

    Figure 3.5: The M/M/m/

    system

    3.4 Other M/M/m/n models

    M/M/m, the m server case The received request is routed to the first of

    available servers, Figure

    3.5. The buffer is infinite. The balance equations are

    This gives

    (3.9)

    It is assumed that

    .

    is obtained using

    ,

    A packet arriving at time

    sees all servers busy (

    ) with probability

    packet arrives in

    packet arrives in

    packet arrives in

    packet arrives in

    from (3.9)

    The expected number of packets waiting in queue (not in service) is

  • 8/4/2019 Bridges Routers

    33/82

    3.4. OTHER M/M/M/N MODELS 33

    By Littles law (see below), the average waiting time in queue (not in service) is

    and the total latency (waiting time) is

  • 8/4/2019 Bridges Routers

    34/82

    34 CHAPTER 3. QUEUING

    3.5 Littles law

    Suppose

    is the cumulative arrivals in

    into a stable queueing system,

    is number of

    packets in system (including those in service). Let

    be latency of packet

    . Let

    be arrival rate.

    Suppose queue is empty at

    and

    . From figure 3.6, the time average of queue size is

    Taking limits as

    , and if time averages equal ensemble averages, we get

    S1 S2

    W2 W4S3 S5S4

    W5W5

    x(t)

    t

    A(t)

    Figure 3.6: Calculations for Littles law

  • 8/4/2019 Bridges Routers

    35/82

    3.6. PASTA 35

    3.6 PASTA

    We have used the PASTA property (Poisson arrivals see time averages) several times.

    Consider stationary queuing system with deterministic service time of 3 and periodic arrivals (period

    10). A sample path with arrivals at 1,2,3,11,12,13,21,22,23,

    and queue process

    is shown in

    figure 3.7.

    Let

    be the probability that

    at any time

    , and let

    be the probabililty that an

    arriving packet sees

    packets in queue. For this system,

    so the two probabilities are not the same.

    1 2 3 4 5 6 7 1110

    x(t)

    Figure 3.7: PASTA property does not hold in this deterministic queuing system

    Consider a M/G/1 system, with stationary probabilities

    . Let

    be the probability that an

    arrival sees

    packets in queue. Then,

    packet arrives in

    packet arrives in

    packet arrives in

    using Bayes rule, independence of arrivals after

    from

    , and independence of service

    times.

  • 8/4/2019 Bridges Routers

    36/82

    36 CHAPTER 3. QUEUING

    S1 S3S4

    S5W

    Wt

    W(t)

    W3

    area (2)

    Figure 3.8: Deriving Pollaczek-Khinchin formula

    3.7 Pollaczek-Khinchin formula

    Consider M/G/1 system with independent service times

    ,

    ,

    , Poisson arrivals

    with rate

    . Let

    be the remaining waiting time, i.e. the amount of time needed to serve packets

    in the system at

    . Let

    and

    be the service time and waiting times of packet

    , see figure 3.8.

    The time average of waiting time

    is the parallelogram area for packet

    , so

    . Substituting and taking

    limits as

    ,

    waiting time faced by arriving packet

    By PASTA,

    waiting time faced by arrival

    . So,

    where

    is the utilization.

    Note: The formula

    involving a random sum of

    terms is sometimes called Walds formula. A general version of

    Walds formula is a consequence of the fact that

    is a martingale. See Problem8.

    Determinism minimizes waiting

    In general,

    , so

    where the last expression is the waiting time for a deterministic service time (eg. ATM cells).

  • 8/4/2019 Bridges Routers

    37/82

  • 8/4/2019 Bridges Routers

    38/82

  • 8/4/2019 Bridges Routers

    39/82

    Chapter 4

    Switching

    4.1 Packet switching

    Architectures

    IQ/HOL

    VoQ

    SQ

    39

  • 8/4/2019 Bridges Routers

    40/82

    40 CHAPTER 4. SWITCHING

    is blocked

    IQ: hol blocking OQ: faster switch

    VoQ: matching SQ: reduces buffer size

    Figure 4.1: Packet switch architectures

    4.1.1 Architectures

    Second generation PRIZMA architecture is , with 2 Gbps ports ... all on one chip [15]

  • 8/4/2019 Bridges Routers

    41/82

    4.1. PACKET SWITCHING 41

    /N

    1

    3

    1

    1

    2

    2

    1

    2

    3

    1

    11

    1HOL queue

    AtXt

    Input from nonblocked queues

    10

    8

    6

    4

    2

    0

    -

    ------------------

    0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1

    Average delay in cell times

    read HOL arrivals

    xt

    At xt+1

    min(1, xt)

    Xt

    Figure 4.2: Virtual HOL queue

    4.1.2 Input queues

    Assume:

    discrete time

    , independent arrivals, uniform destination with prob

    large so total number of port 1 arrivals is Poisson

    port 1 arrivals

    Virtual HOL queue of

    port 1 packets at head of queue:

    (4.1)

    where,

    number of new port 1 packets that come to head unblocked queues

    Suppose equilibrium probability of unblocked queue is is

    . Then

    is Poisson with mean

    . From (4.1)

  • 8/4/2019 Bridges Routers

    42/82

    42 CHAPTER 4. SWITCHING

    so

    . Square (4.1), and take expectations,

    (4.2)

    Since

    ,

    Also

    # blocked queues

    For

    this gives

    ,

    That is 42 % switch bandwidth is not utilized.

    quick upper bound Same switch. But at end of each cycle IQs are flushed. With

    , switch

    throughput is

    Per port throughput is

  • 8/4/2019 Bridges Routers

    43/82

    4.1. PACKET SWITCHING 43

    4.1.3 Virtual output queue

    Each input port has

    VoQs, one per output port.

    If several input ports have packets for same destination, which one should be served.

    Assume iid arrivals

    with rates

    such that

    Service

    such that

    Note: If = above,

    is a permutation over

    .

    Queue lengths

    such that

    Question: Given

    , find

    , based on past arrivals

    and

    , so that

    is stable.

    Conjecture: Always exists stabilizing matching

    .

  • 8/4/2019 Bridges Routers

    44/82

  • 8/4/2019 Bridges Routers

    45/82

  • 8/4/2019 Bridges Routers

    46/82

    46 CHAPTER 4. SWITCHING

    Shared memory

    queue 1

    queue N

    input 1

    input N

    output 1

    output N

    path by packet from

    input 1 to output N

    shared bus

    Figure 4.4: Switch with time-division shared bus and centralized shared memory

    4.2 Shared queue

    This architecture is used in most low speed packet processors: a time-division bus with a centralized

    memory shared by all input and output lines, figure 4.4. Up to

    packets may arrive at one time

    and up to

    may be read at one time, so memory bandwidth must be

    -times line rate. Assume

    100 ns DRAM access time, 53B-wide bus, gives total bandwidth of

    Mbps. For a

    16-port ATM switch, this gives line rate of Mbps.

    Let

    size of 1-list at beginning of slot

    # of 1-packets arriving in slot

    (4.5)

    Following same argument that led to (4.2) gives

    where

    and

    . For the Poisson case,

    , so

    (4.6)

    Shared vs separate queue

    Suppose shared buffer is sized at

    where

    is the standard deviation of

    . Then

    separate buffer size

    shared buffer size

  • 8/4/2019 Bridges Routers

    47/82

    4.3. OUTPUT QUEUE 47

    4.3 Output queue

    In an output queued switch the switch fabric must run

    -times, and the ouptut memory must run

    -times line rate. The queue length in port 1 is given by (4.5).

  • 8/4/2019 Bridges Routers

    48/82

    48 CHAPTER 4. SWITCHING

    4.4 Problems

    1. Assume that

    is Poisson in (4.1) or (4.5). The mean queue size is given by (4.6). Is

    (a)

    Markov? Why?

    (b) If

    is stationary, how would you find

    ?

  • 8/4/2019 Bridges Routers

    49/82

    Chapter 5

    Matching

    Crossbar switches need a controller to schedule a switch. The controller must find a good match,

    eg. longest queue first, oldest cell first, etc.

    It is too expensive to run a centralized matching algorithm with complexity

    or

    . (A

    40-byte packet at a line speed of 1 Gbps amounts to 360 ns/packet.)

    So one may have to be satisfied with maximal matching, using distributed algorithms. Note that for

    a fully-connected bipartite graph, a maximal matching is also maximum.

    In case of QoS, the matching must satisfy some preferences.

    49

  • 8/4/2019 Bridges Routers

    50/82

  • 8/4/2019 Bridges Routers

    51/82

    5.1. THE DATING GAME 51

    The GSA algorithm. Say that a man or woman is

    freeif she/he is not engaged or matched to any man/woman

    engagedif she/he is temporarily matched to some man/woman

    matchedif she/he is terminally matched

  • 8/4/2019 Bridges Routers

    52/82

    52 CHAPTER 5. MATCHING

    BEGIN

    all are free

    Is

    some man

    m free?

    m proposes to w,

    the first woman he has not yet

    proposed to

    is

    w free?

    w is currently

    engaged to m'

    does w

    prefer m to

    m'?

    match w and m,

    set m' free

    ENDNo

    Yes

    w engaged to myes

    m continues freeno

    Figure 5.1: The GS algorithm

  • 8/4/2019 Bridges Routers

    53/82

  • 8/4/2019 Bridges Routers

    54/82

    54 CHAPTER 5. MATCHING

    1

    2

    3

    4

    1

    2

    3

    4

    1

    2

    3

    4

    1

    2

    3

    4

    1

    2

    3

    4

    1

    2

    3

    4

    a1

    a3

    g2

    g4

    1

    2

    3

    4

    1

    2

    3

    4

    Figure 5.2:

    RRM showing

    pointers with

    .

    5.2 Round-robin matching

    Each input

    maintains accept pointer

    . Each output

    maintains grant pointer

    .

    RRM cycle.

    Step 1 Each

    requests all

    with

    .

    Step 2 Each

    grants next requesting input

    at or after current pointer value

    , i.e.

    then increments

    .

    Step 3 Each

    accepts next granted output

    at or after current pointer value

    , i.e.

    If grant

    has been accepted, increments

    . Figure 5.2 illustrates one RRM cycle. Initially,

    all

    , and all

    . The input requests are

    So we have the following steps:

    At the end of this cycle, the match is

    , and the pointer values are given above.

  • 8/4/2019 Bridges Routers

    55/82

    5.2. ROUND-ROBIN MATCHING 55

    5.2.1 Analysis of RRM

    Under heavy load, the grant counters may get synchronized, reducing utitlization. Consider

    all

    . Then it is possible for

    always as follows.

    Match

    Match

    Match

    Match

    At the end of the fourth cycle the situation repeats. Throughput is 50 percent.

    Of course the following TDM cycle is also possible, and has througput of 100 percent.

    Match

    Match

    Under heavy load, if grant counters get syncronized at any time (i.e. have the same value), they ll

    stay synchronized forever.

    Under light load, the grant counters will be randomly distributed. The probability that some input

    is not served is

  • 8/4/2019 Bridges Routers

    56/82

    56 CHAPTER 5. MATCHING

    1

    2

    1

    2

    (1,1) = 1

    (1,2) = 1

    (2,1) = 1

    (1,1) =/

    (1,2) =3/

    (2,1) =3/

    Figure 5.3: PIM can be unfair under heavy load

    5.3 Partial iterative matching, PIM

    Step 1 Each unmatched input

    sends requests to every output

    such that

    .

    Step 2 Each

    randomly picks

    from received requests.

    Step 3 Each

    randomly accepts one of received grants.

    The I in PIM means that this cycle is repeated to improve match.

    5.3.1 Analysis of PIM

    It appears that with uniform iid traffic, PIM achieves maximal match in 3 iterations.

    In heavy load, every input makes requests. Probability that

    receives no grant in one round equals

    PIM can be unfair. Figure 5.3 gives a

    case where the request rates from input

    to output

    is

    . So requests

    are made in each slot. The grant rates from output

    to

    input

    will therefore be

    . So input will accept output 1 with

    probability

    , and output 2 with probability

    ; input 2 will accept output

    1 with probability

    .

    Thus even though arrival rates for output port 1 are equal at inputs ports 1 and 2, the acceptance

    rates are not the same.

  • 8/4/2019 Bridges Routers

    57/82

    5.4. ISLIP MATCHING 57

    5.4 iSLIP matching

    The detailed reference is [4]. The RRM suffers from synchronization of the grant counters. The

    iSLIP modifies RRM slightly so that the grant counters are incremented only if the grant is accepted.

    So step 2 of RRM is modified.

    Step 2 Each

    grants next requesting input

    at or after current pointer value

    , i.e.

    then increments

    only if

    accepts output

    .

  • 8/4/2019 Bridges Routers

    58/82

  • 8/4/2019 Bridges Routers

    59/82

    5.4. ISLIP MATCHING 59

    5.4.2 Priority iSLIP

    Suppose there are

    priority levels. Then each input

    maintains

    VoQs, with

    the

    buffer occupancy of priority

    and output

    . Then

    gives strict priority, i.e. serves

    only

    if

    ,

    . Each input maintains counter

    and each output maintains

    for each

    priority level.

    Step 1 Each

    selects highest priority level

    with non-empty queue to output

    .

    Step 2 Output

    determines highest priority level

    . The output then chooses one

    input among those inputs that have requested at level

    . The output maintains separate pointer

    , and chooses input

    among requests at level

    in the same round-robin scheme. The

    output notifies each input whether or not its request is granted. The pointer

    is

    incremented only if granted input

    accepts output

    .

    Step 3 If input

    receives any grants, it determines the highest priority level grant, say

    . The

    input then chooses one grant among the requests granted at this level. This is done according to the

    counter

    , which is incremented

    . The input then notifies each output whether or

    not its grant was accepted.

  • 8/4/2019 Bridges Routers

    60/82

    60 CHAPTER 5. MATCHING

    5.4.3 Threshold iSLIP

    It may be better to select a weighted maximal match with weights corresponding to queue length.

    If queue lengths are quantized in threshold levels

    , then priorities may be assigned

    accordingly as

    .

    5.4.4 Weighted iSLIP

    Suppose bandwidth from

    to

    is to be shared according to the ratio

    subject

    to

    ,

    .

    In iSLIP each counter is an ordered circular list

    . Now expand the list at output

    to

    where

    is the lcd of

    and input

    appears

    times in the list.

  • 8/4/2019 Bridges Routers

    61/82

    5.4. ISLIP MATCHING 61

    stateofinputqueues(N2

    bits)

    Grant

    arbiters

    Accept

    arbiters

    Decision

    register

    1

    2

    N

    1

    2

    N

    Figure 5.4: Interconnection of the input and output arbiters to construct the iSLIP scheduler

    5.4.5 Implementation

    Figure 5.4 shows how the iSLIP scheduler for a

    switch is constructed from the input and

    output arbiters.

    The state memory records whether an input queue is empty. From this memory, an

    -

    bit wide vector presents

    bits to each of the

    output grant arbiters representing Step 1

    (request).

    The grant arbiters select a single input among the contending requests to implement Step 2

    (grant).

    The grant decisions are presented to the

    accept arbiters, each of which selects at most one

    output on behalf of each input to implement Step 3 (accept).

    The final decision is stored in the decision registers and the value of the

    and

    pointers

    are updated. The decision register is used to notify each input which cell to transmit and to

    configure the crossbar switch.

  • 8/4/2019 Bridges Routers

    62/82

    62 CHAPTER 5. MATCHING

  • 8/4/2019 Bridges Routers

    63/82

    Chapter 6

    Network processors

    Figure 6.1 is a logical diagram of how a network processor (NP) fits in a system design. The

    NP is located between the physical layer (MAC or framer) and the switch fabric. In the figure

    the Serializer/Deserializer (SERDES) is the interface between the NPU and switch fabric. The

    framer or MAC presents a packet to the NPU which must examine it, parse it, do necessary edits

    and database lookups to enforce various policies at layers 3-7 (forwarding, queuing, labels), and

    exchange messages with switch controller. The NP is in the data path.

    63

  • 8/4/2019 Bridges Routers

    64/82

    64 CHAPTER 6. NETWORK PROCESSORS

    Figure 6.1: Location of NP in a logical diagram. Source [17].

    6.0.6 NP operation

    Figure 6.2 shows a generic block diagram. Data of multiple physical interfaces or the switch fabric

    are transferred to/from the NP. The bitstream processors receive the serial stream of packet data and

    extract the information needed to process the data, such as MAC or IP source/destination address,TOS bits, TCP port numbers, MPLS or VLAN tags. The packet is then written into the packet

    buffer memory. This information is fed to the processor complexthe programmable unit of the

    NP. Under program control, the processor may extract additional information from the packet and

    submits relevant information to the search engine which looks up the MAC or IP address, classifies

    the packet, or does a VCI/VPI lookup using the routing/bridging tables. Upon packet transmission

    through the bitstream processor, the necessary modifications to the packet header are performed.

  • 8/4/2019 Bridges Routers

    65/82

    65

    packet

    buffer

    memory

    routing

    and

    bridging

    tables

    buffer

    manager/scheduler

    general

    purposeCPU

    processor

    complex

    search engine HW assistsbitsteream

    processors

    To/from PHY/

    switch fabric

    Figure 6.2: Generic NP architecture. Source [15].

    Figure 6.3: Time to process 40B packets at different line rates. Source [17]

    6.0.7 Speed of operations

    Table 6.3 shows the time available to process back to back 40B packets at different line speeds. At

    1 Gbps, the time to process one packet is 360 ns. Using 10-ns SRAM permits a maximum of 36

    memory accesses. Thus faster line rates can be accommodated only by processing several packets

    simultaneously in a pipelined or parallel fashion.

  • 8/4/2019 Bridges Routers

    66/82

    66 CHAPTER 6. NETWORK PROCESSORS

    6.0.8 Packet buffer memory

    For the architecture offigure 6.2, each packet header byte may traverse the memory interface at

    least four times:

    write inbound packet

    read header into processor complex

    write back to memory

    read for outbound transmission

    So for 40 byte back-to-back packets the required memory interface capacity is 10-120 Gbps for linerates of 2.5-40 Gbps.

  • 8/4/2019 Bridges Routers

    67/82

  • 8/4/2019 Bridges Routers

    68/82

    68 CHAPTER 7. DISTRIBUTED SWITCH

    1

    N

    1

    M

    cap = 1

    Figure 7.1: A distributed switch is a network of switches with certain number of input and output

    ports,

    input nodes and

    output nodes

    7.2 Clos network

    This is a 3-stage network as illustrated in Figure 7.2. The Clos network is specified by 5 numbers

    IN

    OUT. There are

    switches arranged in 3 stages. The number of

    input-output ports and connectivity of the switches are as shown.

    Theorem A Clos network with RNB switch modules is RNB iff

    IN

    OUT

    A Clos network with SNB switch modules is SNB iff

    IN

    OUT

    The total number of input lines is IN

    . The total number of output lines is OUT

    .

    The Clos network in the figure is SNB. It has 9 input lines and 8 output lines.

  • 8/4/2019 Bridges Routers

    69/82

  • 8/4/2019 Bridges Routers

    70/82

  • 8/4/2019 Bridges Routers

    71/82

    7.3. RECURSIVE CONSTRUCTION 71

    N = p x q

    q x q

    N = p x q

    q planesq planes

    p x p p x p

    1

    q

    p

    1

    p

    1

    q

    p

    Figure 7.4: Recursive construction of a RNB CLos network

    Figure 7.4 is a

    RNB switch if each module is RNB.

  • 8/4/2019 Bridges Routers

    72/82

    72 CHAPTER 7. DISTRIBUTED SWITCH

    N N

    N/2 N/2

    N/2 N/2

    2 log2N 1 stages ofN /2 2 2 switches

    2 2

    Figure 7.5: The Benes switch

    Figure 7.5 is a

    RNB switch made up of

    switch modules.

  • 8/4/2019 Bridges Routers

    73/82

  • 8/4/2019 Bridges Routers

    74/82

    74 CHAPTER 7. DISTRIBUTED SWITCH

    Figure 7.7 illustrates an algorithm to rearrange existing connections in order to accommodate a new

    connection.

    Question 1: Can you supply a proof?

    Question 2: Is there an alogrithm to accommodate new connections in an arbitrary network of Figure

    7.1?

  • 8/4/2019 Bridges Routers

    75/82

    7.3. RECURSIVE CONSTRUCTION 75

    12

    3 4

    5

    Figure 7.7: Algorithm to add a new connection for a RNB switch

  • 8/4/2019 Bridges Routers

    76/82

    76 CHAPTER 7. DISTRIBUTED SWITCH

    In a Benes switch, feasible flows may require multiple paths. Figure7.8 and 7.9 show this. Note:

  • 8/4/2019 Bridges Routers

    77/82

    7.3. RECURSIVE CONSTRUCTION 77

    1234

    1 2 3 4

    e

    e

    1-2e

    1-e

    1-e

    e e

    1

    e

    1 1-e

    1-e

    e

    1 1-ee

    1

    1-2e

    2e

    1-e

    e

    4

    1

    4

    1

    3

    e

    1-e

    1 1-e

    e

    e

    1

    1-2e

    2e

    1-e

    1-e

    e

    1-2e

    e2

    Figure 7.8: Split flow 1

  • 8/4/2019 Bridges Routers

    78/82

    78 CHAPTER 7. DISTRIBUTED SWITCH

    1

    1

    e

    e

    1-e1

    1-2e

    1-e

    e

    e

    1-e

    2e

    1

    1

    e

    e

    1-e

    1-2e

    1-e

    4

    2

    234

    1 2 3 4

    e

    e

    1-2e

    1-e

    1-e

    e e

    1

    3

    1

    1

    e

    e

    1-e e

    1-2e1-e

    1

    1-2e

    1-e

    e

    e

    1-ee

    1-e

    2e

    1

    Figure 7.9: Split flow 2

  • 8/4/2019 Bridges Routers

    79/82

    7.3. RECURSIVE CONSTRUCTION 79

    1

    1

    1

    1 1

    2

    23

    3

    Figure 7.10: Max flow for single commodity is 3 and flows are integers; in multi-commodity case,

    max flows are 0.5 and non-integer

    In a Clos switch, permutations can be achieved without splittling flows. In a general multi-commodity

    case this is not so. Figure 7.10 shows that if this is a single commodity problem, the maximum flow

    is 3 and all flows are 1 (integer).

    However, if the flows are

    , the max flows are 0.5 each, and not integer.

  • 8/4/2019 Bridges Routers

    80/82

    80 CHAPTER 7. DISTRIBUTED SWITCH

    1

    1

    1

    1 1

    2

    23

    3

    1

    1

    1

    1 1

    2

    23

    3

    0.5

    0.5

    Figure 7.11: Two copies offigure 7.10 are connected in parallel. Achieving flows of 1,1,1 requiressplittling

    Figure 7.11 shows that a feasible permutation may require splittling flows. The green and cyan

    flows must be connected in parallel similarly to the red flow.

  • 8/4/2019 Bridges Routers

    81/82

    Bibliography

    [1] J. Walrand and P. Varaiya. Chapter 12, Switching. High performance communication networks

    2nd edition, 2000.

    [2] M.J. Karol, M. Hluchyj and S. Morgan. Input vs output queueing on a space-division packet

    swtich. IEEE Trans Comm, COM-35(12): 1347-56, Dec. 1987.

    [3] T.E. Anderson, S. Owicki, J. Saxe and C.P. Thacker. High-speed scheduling for local area

    networks. ACM Trans Computer Systems, 11(4):319-52, Nov. 1993.

    [4] N. McKeown. iSLIP: a scheduling algorithm for input-queued switches. IEEE Trans Network-

    ing, 7(2), April 1999.

    [5] N. McKeown, V. Anatharam and J. Walrand. Achieving 100% througput in an input-queued

    switch. Proc. Infocom 96, vol 1: 296-302.

    [6] B. Prabhakar and N. McKeown. On the speedup required for combined input and output

    queued switching. Automatica, 35(12), Dec. 1999

    [7] J.F. Hayes, R. Breault and M.K. Mehmet-Ali. Performance analysis of a multicast switch.

    IEEE Trans Comm, COM-39(4): 581-87, April. 1991.

    [8] B. Prabhakar, N. McKeown and R. Ahuja. Multicast scheduling for input-queued switches. J.

    Selected Areas in Comm 15(5):855-66, June 1997.

    [9] M. Waldvogel, G. Varghese, J. Turner and B. Plattner. Scalable high speed IP routing lookups.

    ACM Sigcomm 97September 1997.

    [10] A. Demers, S. Keshav and S. Shenker. Analysis and simulation of a fair queueing algorithm.

    ACM Sigcomm 89 Computer Communication Review, 19(4): 1-12, 1989.

    [11] A. Parekh and R. Gallager. A generalized processor sharing approach to flow control of inte-

    grated services networks: the single node case. IEEE Trans Networking, 1(3): 344-57, June

    1993.

    [12] A. Parekh and R. Gallager. A generalized processor sharing approach to flow control of inte-

    grated services networks: the multiple node case. IEEE Trans Networking, 2(2): 137-50, April

    1994.

    81

  • 8/4/2019 Bridges Routers

    82/82

    82 BIBLIOGRAPHY

    [13] S. Floyd and V. Jacobsen. Random early detection. IEEE Trans Networking, 1(4): 397-413,

    August 1993.

    [14] I. Stoica, S. Shenker and H. Zhang. Core-stateless fair queuing: achieving approximately fair

    bandwidth allocations in high speed networks. ACM Sigcomm 98, 1998.

    [15] W. Bux, W.E. Denzel, T. Engbersen, et al. Technologies and building blocks for fast packet

    forwarding. IEEE Communications Magazine, 39(1): 70-77, January 2001.

    [16] P.R. Kumar and S. Meyn. Stability of queuing networks and scheduling policies. IEEE Trans.

    Automatic Control, 40(2), February 1995.

    [17] A. Deb. Building a network-processor based system. Integrated Communications Design,

    December 2000. Available at www.icdmag.com.