Fast Switches Switch Fabric Architecture Fast Datagram Switches Higher-Layer and Active Processing...

Fast Switches

Switch Fabric Architecture Fast Datagram Switches Higher-Layer and Active Processing (From Kwangwoon Univ.)

Introduction

• The way of the determination and setting the path– Centralized control : single point control– Distributed control : per input port processing– Self-routing : autonomous control– Distributed control & Self-routing

• Advantage : don’t limit scalability• Disadvantage : difficult global optimization

• Blocking Characteristics– Strictly nonblocking– Wide-sense nonblocking : switching algorithm (set path)– Rearrangeably nonblocking : rearrange path– Virtually nonblocking : low probability of blocking

• Nonblocking Switch Fabric Principle : Avoid blocking by space-division parallelism, internal speedup, and internal pipelined buffering with cut-through

Buffering

• Why buffering?– If all traffic were uniform, buffering would not be needed.

– If traffic is bursty, however, buffering would be needed

because packets are to be dropped.

IN 1

IN 2

OUT

collisions

delayed

Buffering

• Buffer Location– Unbuffered

• This way is undesirable for fast packet switches.(buffering)

• Optical components are suitable, because there is no way to queue.– Dealing with an optical burst switch

» Dropping burst and retransmitting end-to-end are enable.

» Burst be deflected by scheduling.

» Convert burst to the electronic domain for queueing.

– Internally buffered• Increases complexity

– Input or ouput queued

– Input AND ouput queued

– Shared buffer switch• A logical partitioning of physical memory

Buffering

• Buffer location– Unbuffered vs internally buffered

Buffering

• Buffer Location– Input or output buffered switches.

Buffering

• Buffer Location– Combined input/output buffered switch

Buffering

• Buffer Location– Shared buffer switch

Buffering

• Head-of-line blocking– Input queueing

• Input queueing holds packets until the switch is able to direct them to the appropriate output

– Output queueing• Shared medium network due to contention from other network nodes for M

AC

– Speedup (S) : the ratio of internal to external data rates

– Internal buffering

– Internal expansion : clos fabic

•Head-of-line blocking Avoidance principle

•Output queueing requires internal speedup, expansion, or buffering. Virtual output requires additional queues or queuing complexity. The two techniques must be traded against on another,and can be used in combination

Buffering

• Head-of-Line Blocking– Input vursus output queueing

Buffering

• Head-of-line blocking– Clos fabric

Buffering

• Virtual Output Queueing– This scheme requires that packets be multiplexed and timestamped todet

ermain the arrival orderamong the queues at each input

– A scheduling algorithm can be applied to determine which packets to accept to match a set of nonconflicting output

– Disadvantage• Waste of buffer space

– tradoff• Increase memory density for more queues practical

• Increased logic density makes more complex hardware

Buffering

• Virtual Output Queueing : – Head of line blocking can be eliminated

Single-Stage Shared Elements

• Bus Interconnects

i0 i1 i2 i3 i4 i5 i6 i7

o0 o1 o2 o3 o4 o5 o6 o7

w


• Bus Interconnects– Packet must wait in input queues until the bus is free

– Aggregate throughput : ri < w/nt (w:bandwidth, 1/nt:n port, bit rate)

– Bus speedup is limited by the available electronic technology

– Multicast

– Ring Switches

– Throughput can be higher due to better ring utilization of the MAC protocol and the isolation of electrical effects.


• Shared Memory Fabrics

Shared memory

Output demultiplex

INPUT

MULTIPLEX

I0

I1

I2

I3

I4

I5

I6

I7

o0 o1 o2 o3 o4 o5 o6 o7


• Shared Memory Fabrics– Difficulties

• memory density is increaing exponentially, memory access time are not.• Packet must typically be completely read into memory before being output

– Multicast

Single-Stage Space Division Elements

• Basic Switch Element

– Electronic Switch Elements : 2 * 2 switch element

straight cross duplicate

Control

Packet buffer

Packet buffer

Cut-through

Cut-through

Outputmultiplexor

o0

i1

c

i0

o1


• Basic Switch Element– Electronic Switch Elements(2 * 2 Self-routing switch element)

Control

Packet buffer

Packet buffer

Cut-through

Cut-through

Outputmultiplexor

o0

i1

i0

o1

Headerdecode

Headerdecode

delay

delay


• Basic Switch Element– Optical Switch Elements

electrode

electrode

i0

i1

o0

o0

electrode

electrode

i0

i1

o0

o0

Cross state

straight state (voltage applied)


• Crossbar

– Crossbar switch point states

column

oj

ii

column

oj

ii

electronic

Optical MEMS

cross turn duplicate

Multistage Switches

• Crossbar– Advantage

• Simple and regularity

– Disadvantage• Scaling complexity(n2)

• Simple model of the cost in chip area– A=ac + n(ai + ao) + n2ax


• Crossbar– Crossbar switch

I0

I1

I2

I3

I4

I5

I6

I7 o0 o1 o2 o3 o4 o5 o6 o7

Multistage Switches

• Tiling Crossbar– Tile switch elements in a square array

– This is not a cost effective solution for large switches

• Multistage Interconnection Networks(MINs)– Delta switch

• advantage– Elimination of central switch control(self-routing)

• Disadvantage– Preservation of packet sequence since cell has same path

– Load is not distributed

– Benes switch• Dinamically route packets with additional stages.

– Resequencing buffer by using a timestamp inserted into the internal switch header

– Banyan switch• Using shared memory and crossbar switchs.

1010

Multistage Switches

• Multistage Interconnection Networks – Delta switch

I0

I1

I2

I3

I4

I5

I6

I7

I8

I9

I1

0

I1

1

I1

2

I1

3

I1

4

I1

5

o0

o1

o2

o3

o4

o5

o6

o7

o8

o9

o10

o11

o12

o13

o14

o15

Multistage Switches

• Multistage Interconnection Networks– Benes switch

I0

I1

I2

I3

I4

I5

I6

I7

I8

I9

I1

0

I1

1

I1

2

I1

3

I1

4

I1

5

o0

o1

o2

o3

o4

o5

o6

o7

o8

o9

o10

o11

o12

o13

o14

o15

s0 s1 s2 s3 s4 s5 s1010 1010 1010 1010

Multistage Switches

• Multistage Interconnection Networks– Banyan switch

I0

I1

I2

I3

I4

I5

I6

I7

I8

I9

I10

I11

I12

I13

I14

I15

o0

o1

o2

o3

o4

o5

o6

o7

o8

o9

o10

o11

o12

o13

o14

o15

S0 S1

Multistage Switches

• Multistage Interconnection Networks– Optical Multistage Networks

• Incapable of buffering : nonblocking bufferless interconnection fabrics

• Crosstalk problem : dilation techniques

• Dilated Benes switch

Pass Cross

Multistage Switches

• Scaling Speed(parallel switch slices)

datadelay

σ0

σm-1

io

i1

In-1

co

c1

cn-1

on-1

o1

o1

FabricControl

Multicast Support

• Crossbar Switch Multicast– Service disciplines

• No fanout splitting : according to output blocking

• Fanout splitting

– The Goal of schedule servicing• Throughput is high

• Some fairness measure is met, in particular packets should not be starved

• The scheduling discipline can be implemented at high-speed(line rate)

– Variety of scheduling are possible• Concentrates residue among as few inputs as possible

• Weight based

Multicast Support

• Crossbar Switch Multicast scheduling

I

1

I

2

I

3

I

4

I

5

1_3_5

_2345

1234_

_23_5

_2_4_

1 1 12 2

3 4 2 5 2

3 3 3 4

5 4

o1 o2 o3 o4 o5

Multicast Support

• Multistage Fabric Multicast

I0

I1

I2

I3

I4

I5

I6

I7

I8

I9

I1

0

I1

1

I1

2

I1

3

I1

4

I1

5

o0

o1

o2

o3

o4

o5

o6

o7

o8

o9

o10

o11

o12

o13

o14

o15

Copy stages Routing stagesTranslate

00000100

1010

1110

Review – Fast Packet SwitchingReview – Fast Packet Switching

• 80’s link rate technology improvement.• Connection-oriented fast packet switching technologies for

high speed networks.• 90’s widely deployed.

– ex. ATM for high-speed backbone networks

• Benefit (5.3)– Simplifying packet processing and forwarding.– Eliminating the store and forward latency.– Provide QoS quarantees, Resources reservation.

Fast Datagram SwitchesFast Datagram Switches

• Resisted the global deployment of CON.– IP-based Internet, WWW.

– shared medium link protocols were overcome

• Fast Datagram Switches– Motivation

• High Performance maintaining.

• Support Connectionless networks.

– Derivation• Complexity of Switch input and output processing

Fast Packet Switching ArchitectureFast Packet Switching Architecture

routing and s ignaling

switchfabric

contro l

input processing output processing

linkschedulingC ID table

switchfabric

link

link

link

link

label swap

Connection-Oriented Vs ConnectionlessConnection-Oriented Vs Connectionless

• Similarity– At a high-level, Each switch has the same functional block.– Ex. Routing, Signaling, Management…

• Difference– Input processing

• Address lookup using a prefix table.• Packet classification.

– Output processing• Packet scheduling to meet QoS requirement.

Architecture of Fast Datagram SwitchingArchitecture of Fast Datagram Switching

routing and s ignaling

switchfabric

contro l

output processinginput processing

inputprocessor

outputscheduling

link

outputscheduling

link

link

link

inputprocessor

headers

switchfabric

prefixesprefixes

headerprocessing

headerprocessing

forwardingengines

Packet Processing RatesPacket Processing Rates

• Design a switch– Datagram size : Min 40Byte ~ Max 1500Byte.

– Rule of thumb : average packet size.

• Form of Processing– Sequentially processing for minimum packet size.

– Parallel processing for average packet size.

• Packet Processing Rate

The Packet processing rate is a key throughput measure of a switch. Packet processing software and shared parallel hardware resources must be able to sustain the average packet processing rate.

Fast Forwarding LookupFast Forwarding Lookup

• Review - Fast Packet Switching– CID for Fast Packet Switching.

– Problem : Table entry size.

• Fast Datagram Switching– Problem : similar to Fast Packet Switching.

– Solutions• Flat Addressing

• Hierarchical Address

• Software Prefix Matching

• Hardware Matching Support

• Source Routing

Flat AddressingFlat Addressing

adest pout

asrc adest payload payloadasrc adestpout

hardware match

softwaresearch

=

Figure 5.50 Address lookup

Software SearchSoftware Search

• Lookup Time– Worst case : minimum packet size, worst-case lookup algorithm

• Memory Required– Trade Off( performance vs cost ).– Amount of memory reasonable to contain in the switch input proce

ssing.

• Update Time– Lookup data structure.

• Techniques– Tree search( O(logBN) for N entries, B is branch factor ).– Hash function( O(1) for no hash collisions ).

Content Addressable Memory(CAM)Content Addressable Memory(CAM)

• Feature– Parallel scan– Memory Access– Referencing One(by Key)– Return Associate Data

• Benefit– Initutive & speed

• Model

• Each word consists of a <search-field, return-field>.

• All words are checked in parallel in a single CAM cycle.

• Return-field portion of the word is the output of the CAM read.

• CAMs specifically designed for network address lookup.Key

Data

Association

Data

Hierarchical AddressesHierarchical Addresses

• Exploited to reduce the size of the forwarding tables.

• Forwarding entries can be represented as prefix addresses.

• Higher order bit portion of an address that must be matched to lead toward the destination.

• Similar to PSTN.

Software Prefix MatchingSoftware Prefix Matching

prefix pout fstate

pout 101 011 01 payloadpayload101 011 01

- hop countchecksum fix

*

00*

001*

0001*

0101*

101*

10100*

11*

111*

Figure 5.52 IP Prefix matching

Basic Prefix Matching Algorithm Basic Prefix Matching Algorithm

*

0 1

00 * 01 10 11 *

000 001 * 010 101 * 111 *

0001 * 0101 * 1010

10100 *

=101 011 01

101

Figure 5.53 Trie prefix matching

Hardware Matching SupportHardware Matching Support

• Motivation– Complexity of software algorithms.

• Hardware techniques for line rate lookup.– Assisting logic can be Embedded in the memory.

• CAMs for Variable Prefixes.– Translation logic can be provided that assists the location of addre

sses in conventional memory.• Multistage Lookup.

CAMs for Variable PrefixesCAMs for Variable Prefixes

prefix pout fstate

*00XXXX001XXX0001XX0101XX101XXX10100X11XXXX111XXX

- hop c ountc hec ksum fix

101 011 01 payload 101 011 01pout payload

prioritymux

Figure 5.54 Ternary C AM prefix matc hing

Multistage LookupMultistage Lookup

101 011 01

101011

p 2- 1

0 i

0

1

pout / index

101 011 block

pout

long prefix table

short prefix table

pout 101 011 01 payload

payload

Figure 5.55 Multistage prefix match

Source RoutingSource Routing

• Eliminate the per hop address lookup – By precomputing the route.

• Include the entire path in the packet header.

1

2 3

5 0 6 .

0 6 .6 .

.p5

p0

p6

Figure 5.56 Source routed label stack

Packet ClassificationPacket Classification

• Two other common forms – Separation of control packets.– Separation of packets belonging to different traffic classes.

• General classification include– Classification into a QoS traffic class.– Policy based routing.– Security.– Higher-layer switching functions.– Active networking.

Packet Filtering ProblemPacket Filtering Problem

• Classification occur before queueing in the node.

• General problem of classification.

TO S src adr payload

R2

R3

R4

R1R0

source address

TOS

R5

Packet Classification ImplementsPacket Classification Implements

• Hardware Classification– Ternary CAMs can be used to match the rules in parallel.– Similar to the address lookup.

• Software Classification– Forwarding table lookup(section 5.1.1).– “Grid of tries”, “Tuple space search”.

• Preprocessing Classifiers– Preprocess all possible packet fields.

Output Processing and Packet Scheduling (1)Output Processing and Packet Scheduling (1)

• Reasons to perform output scheduling

– Datagrams are consist of• Quranteed Service classes.• Best Effort Traffic.

– Sufficient to meet delay and bandwidth bounds.– Fair service among the best-effort flows.

– Congestion control mechanisms does not protect quaranteed service classes from the best-effort traffic.

Output Processing and Packet Scheduling (2)Output Processing and Packet Scheduling (2)

• Fair Queuing– Packet Fair Queuing(PFQ).

– Weighted Fair Queuing(WFQ).

• Per-Flow Queuing– The highest degree of isolation.

– Control occurs when per flow queuing is used.

• Congestion Control– Large building queues increase delay, resulting in congestion.

– Discard to keep Queues from building.

– Ex. RED(Random early detection)

Higher-Layer and Active ProcessingHigher-Layer and Active Processing

• Active networking uses general classification techniques.– First, identify packets for active processing.

– Executes active applications in the network nodes on the identified packets, connections, or flows to provide the desired service.

• Motivation for “Active networking”– Open flexible interfaces to allow provisioning of new protocols and

services.

• Condition for “Active networking”– Should not impede the non-Active fast path.

Active Network Node Reference ModelActive Network Node Reference Model

MEE( Manage-- ment EE )

EEs ( Execution environments )

NodeO S

switch control

switchfabric

normal forwarding pathpacket filter

Ac tiveProcessing

Figure 5.58 Ac tive network node reference model

Fast Switches Switch Fabric Architecture Fast Datagram Switches Higher-Layer and Active Processing...

Documents

Transcript of Fast Switches Switch Fabric Architecture Fast Datagram Switches Higher-Layer and Active Processing...