Shivkumar KalyanaramanRensselaer Polytechnic Institute
1
High Speed Router Design
Shivkumar KalyanaramanRensselaer Polytechnic Institute
[email protected] http://www.ecse.rpi.edu/Homepages/shivkuma
Based in part on slides of Nick McKeown (Stanford), S. Keshav (Cornell)
Shivkumar KalyanaramanRensselaer Polytechnic Institute
2
Overview
IntroductionEvolution of High-Speed RoutersHigh Speed Router Components:
Lookup AlgorithmClassificationSwitching
Shivkumar KalyanaramanRensselaer Polytechnic Institute
3
What do switches/routers look like?
Access routerse.g. ISDN,ADSL
Core routere.g. OC48c POS
Core ATM switch
Shivkumar KalyanaramanRensselaer Polytechnic Institute
4
Basic Architectural Components
PolicingOutput
SchedulingSwitching
Routing
CongestionControl
ReservationAdmissionControl
Control
Datapath:per-packet processing
Shivkumar KalyanaramanRensselaer Polytechnic Institute
5
Basic Architectural Components:Forwarding Decision
ForwardingDecision
ForwardingDecision
ForwardingDecision
ForwardingTable
ForwardingTable
Interconnect
OutputScheduling
1.
2.
3.
ForwardingTable
Shivkumar KalyanaramanRensselaer Polytechnic Institute
6
Per-packet processing in an IP Router
1. Accept packet arriving on an incoming link.2. Lookup packet destination address in the
forwarding table, to identify outgoing port(s).3. Manipulate packet header: e.g., decrement
TTL, update header checksum.4. Send (switch) packet to the outgoing port(s).5. Classify and buffer packet in the queue.6. Transmit packet onto outgoing link.
Shivkumar KalyanaramanRensselaer Polytechnic Institute
7
Lookup and Forwarding Engine
headerpayload
Router
Destination Address
Dest-network PortForwarding Table
Routing Lookup Data Structure
65.0.0.0/8128.9.0.0/16
149.12.0.0/19
31
7
Packet
Outgoing Port
Shivkumar KalyanaramanRensselaer Polytechnic Institute
8
Example Forwarding Table
7142.12.0.0/19
1128.9.0.0/16
365.0.0.0/ 8
Outgoing PortDestination IP Prefix
Prefix length
IP prefix: 0-32 bits
142.12.0.0/19
0 224
65.255.255.255128.9.16.14
128.9.0.0/1665.0.0.0/8
232-165.0.0.0
Shivkumar KalyanaramanRensselaer Polytechnic Institute
9
Prefixes can Overlap
128.9.16.0/21 128.9.172.0/21
Longest matching prefix 128.9.176.0/24
Routing lookup: Find the longest matching prefix (aka the most specific route) among all prefixes that match the destination address.
0 232-1
128.9.0.0/16142.12.0.0/1965.0.0.0/8
128.9.16.14
Shivkumar KalyanaramanRensselaer Polytechnic Institute
10
Difficulty of Longest Prefix Match
8
24
Prefix Values
Pref
ix
128.9.0.0/16 142.12.0.0/1965.0.0.0/8
128.9.16.14
128.9.172.0/21128.9.176.0/24
128.9.16.0/21
2-dimensional search: - Prefix Length- Prefix Value
32
Leng
th
Shivkumar KalyanaramanRensselaer Polytechnic Institute
11
Lookup Rates Required
12540.0OC768c2002-03
31.2510.0OC192c2000-01
7.812.5OC48c1999-00
1.940.622OC12c1998-99
40B packets (Mpps)
Line-rate (Gbps)
LineYear
DRAM: 50-80 ns, SRAM: 5-10 ns
31.25 Mpps ⇒ 33 ns
Shivkumar KalyanaramanRensselaer Polytechnic Institute
12
Update Rates RequiredRecent BGP studies show that updates can be:
Bursty: several 100s of routes updated/withdrawn => insert/delete operationsFrequent: Average 100+ updates per second
Need data structure to be efficient in terms of lookup as well as update (insert/delete) operations.
Shivkumar KalyanaramanRensselaer Polytechnic Institute
13
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
Size of the Forwarding Table
95 96 97 98 99 00Year
Num
ber
of P
refi
xes
10,000/year
Renewed ExponentialGrowth
Renewed growth due to multi-homing of enterprise networks!
Source: http://www.telstra.net/ops/bgptable.html
Shivkumar KalyanaramanRensselaer Polytechnic Institute
14
Global routing table vs Moore's law since 1999Potential Hyper-Exponential Growth!
50000
90000
100000
110000
01/99
Pref
ixes
Global prefixesMoore's law
Double growth
160000
150000
140000
130000
120000
80000
70000
60000
04/99 07/99 10/99 01/00 04/00 07/00 10/00 01/01 04/01
Shivkumar KalyanaramanRensselaer Polytechnic Institute
15
Trees and Tries
Binary Search Tree
< >
< > < >
log2 N
N entries
Binary Search Trie
0 1
0 1 0 1
111010
Shivkumar KalyanaramanRensselaer Polytechnic Institute
16
Trees and TriesMultiway tries
16-ary Search Trie
0000, ptr 1111, ptr
0000, 0 1111, ptr
000011110000
0000, 0 1111, ptr
111111111111
Shivkumar KalyanaramanRensselaer Polytechnic Institute
17
Lookup: Multiway TriesTradeoffs
Degree ofTree
# MemReferences
# Nodes(x106)
Total Memory(Mbytes)
FractionWasted (%)
2 48 1.09 4.3 494 24 0.53 4.3 738 16 0.35 5.6 8616 12 0.25 8.3 9364 8 0.17 21 98256 6 0.12 64 99.5
Table produced from 215 randomly generated 48-bit addresses
Shivkumar KalyanaramanRensselaer Polytechnic Institute
18
Routing Lookups in Hardware
Prefix length
Num
ber
Most prefixes are 24-bits or shorter
Shivkumar KalyanaramanRensselaer Polytechnic Institute
19
Routing Lookups in Hardware14
2.19
.6.1
4
Prefixes up to 24-bits14
2.19
.614
1 Next Hop
24
Next Hop
142.19.6
224 = 16M entries
Shivkumar KalyanaramanRensselaer Polytechnic Institute
20
Routing Lookups in Hardware12
8.3.
72.4
4
Prefixes up to 24-bits
128.
3.72
44
1 Next Hop
128.3.72
24 0 Pointer
8
Prefixes above 24-bits
Next Hop
Next Hop
Next Hop
offs
etba
se
Shivkumar KalyanaramanRensselaer Polytechnic Institute
21
1.
Basic Architectural Components: Interconnect
ForwardingDecision
ForwardingTable
ForwardingDecision
ForwardingTable
ForwardingDecision
ForwardingTable Interconnect
2.Output
Scheduling
3.
Shivkumar KalyanaramanRensselaer Polytechnic Institute
22
Most Ethernet switches and cheap packet routersBottleneck can be CPU, host-adaptor or I/O busWhat is costly? Bus ? Memory? Interface? CPU?
Shared Backplane
Line Interface
CPUMemory
CPU BufferMemory
LineInterface
DMA
MAC
LineInterface
DMA
MAC
LineInterface
DMA
MAC
First-Generation IP Routers
Shivkumar KalyanaramanRensselaer Polytechnic Institute
23
Second-Generation IP RoutersCPU Buffer
Memory
LineCard
DMA
MAC
LocalLocalBufferBuffer
MemoryMemory
LineCard
DMA
MAC
LocalLocalBufferBuffer
MemoryMemory
LineCard
DMA
MAC
LocalLocalBufferBuffer
MemoryMemory
Port mapping intelligence in line cardsHigher hit rate in local lookup cacheWhat is costly? Bus ? Memory? Interface? CPU?
Shivkumar KalyanaramanRensselaer Polytechnic Institute
24
Third-Generation Switches/Routers
LineCard
MAC
LocalBuffer
Memory
CPUCard
LineCard
MAC
LocalBuffer
Memory
Switched Backplane
Line Interface
Line Interface
Line Interface
Line Interface
Line Interface
Line Interface
Line Interface
Line InterfaceCPU
Memory
Third generation switch provides parallel paths (fabric)What’s costly? Bus? Memory, CPU?
Shivkumar KalyanaramanRensselaer Polytechnic Institute
25
Fourth-Generation Switches/RoutersClustering and Multistage
1 2 3 4 5 6 7 8 9 10 111213 1415 16
17 1819 20 21 22 23 2425 26 2728 29 30 31 32
13 14 15 16 17 18
19 20 21 22 23 24
25 26 27 28 29 30
31 32 21
1 2 3 4 5 6
7 8 9 10 11 12
Shivkumar KalyanaramanRensselaer Polytechnic Institute
26
Circuit switchA switch that can handle N calls has N logical inputs and N logical outputs
N up to 200,000Moves 8-bit samples from an input to an output port
Recall that samples have no headersDestination of sample depends on time at which it arrives at the switch
In practice, input trunks are multiplexedMultiplexed trunks carry frames = set of samples
Goal: extract samples from frame, and depending on position in frame, switch to output
each incoming sample has to get to the right output line and the right slot in the output frame
Shivkumar KalyanaramanRensselaer Polytechnic Institute
27
Call blockingCan’t find a path from input to outputInternal blocking
slot in output frame exists, but no pathOutput blocking
no slot in output frame is availableOutput blocking is reduced in transit switches
need to put a sample in one of several slots going to the desired next hop
Shivkumar KalyanaramanRensselaer Polytechnic Institute
28
Multiplexors and demultiplexorsMost trunks time division multiplex voice samplesAt a central office, trunk is demultiplexed and distributed to active circuitsSynchronous multiplexor
N input linesOutput runs N times as fast as input
…
123
N
MUX
123De-
MUX …1 2 3 … NN
Shivkumar KalyanaramanRensselaer Polytechnic Institute
29
Switching: what does a switch do?Transfers data from an input to an output
many ports (density), high speeds Eg: Crossbar
Shivkumar KalyanaramanRensselaer Polytechnic Institute
30
Time division switchingKey idea: when de-multiplexing, position in frame determines output trunkTime division switching interchanges sample position within a frame: time slot interchange (TSI)
Shivkumar KalyanaramanRensselaer Polytechnic Institute
31
Space division switchingEach sample takes a different path through the switch, depending on its destination
Shivkumar KalyanaramanRensselaer Polytechnic Institute
32
CrossbarSimplest possible space-division switchCrosspoints can be turned on or off, long enough to transfer a packet from an input to an outputInternally nonblocking
but need N2 crosspointstime to set each crosspoint grows quadratically
Shivkumar KalyanaramanRensselaer Polytechnic Institute
33
Multistage crossbarIn a crossbar during each switching time only one cross-point per row or column is activeCan save crosspoints if a cross-point can attach to more than one input line (why?)This is done in a multistage crossbarNeed to rearrange connections every switching time
Shivkumar KalyanaramanRensselaer Polytechnic Institute
34
Multistage crossbarCan suffer internal blocking
unless sufficient number of second-level stages
Number of crosspoints < N2
Finding a path from input to output requires a depth-first-searchScales better than crossbar, but still not too well
120,000 call switch needs ~250 million crosspoints
Shivkumar KalyanaramanRensselaer Polytechnic Institute
35
Packet switchesIn a circuit switch, path of a sample is determined at time of connection establishmentNo need for a sample header--position in frame usedIn a packet switch, packets carry a destination field or label
Need to look up destination port on-the-flyDatagram switches
lookup based on entire destination address (longest-prefix match)
Cell or Label-switcheslookup based on VCI or Labels
Shivkumar KalyanaramanRensselaer Polytechnic Institute
36
Blocking in packet switchesCan have both internal and output blockingInternal
no path to outputOutput
trunk unavailableUnlike a circuit switch, cannot predict if packets will block (why?)If packet is blocked => must either buffer or drop
Shivkumar KalyanaramanRensselaer Polytechnic Institute
37
Dealing with blocking in packet switches
Over-provisioninginternal links much faster than inputs
Buffersat input or output
Backpressureif switch fabric doesn’t have buffers, prevent packet from entering until path is available
Parallel switch fabricsincreases effective switching capacity
Shivkumar KalyanaramanRensselaer Polytechnic Institute
38
Switch Fabrics: Buffered crossbarWhat happens if packets at two inputs both want to go to same output?Can defer one at an input bufferOr, buffer cross-points: complex arbiter
Shivkumar KalyanaramanRensselaer Polytechnic Institute
39
Switch fabric elementGoal: towards building “self-routing” fabricsCan build complicated fabrics from a simple element
Routing rule: if 0, send packet to upper output, else to lower output
If both packets to same output, buffer or drop
Shivkumar KalyanaramanRensselaer Polytechnic Institute
40
BanyanSimplest self-routing recursive fabric
What if two packets both want to go to the same output?
output blocking
Shivkumar KalyanaramanRensselaer Polytechnic Institute
41
Blocking in Banyan S/ws: SortingCan avoid blocking by choosing order in which packets appear at input portsIf we can
present packets at inputs sorted by outputremove duplicates remove gapsprecede banyan with a perfect shuffle stagethen no internal blocking
For example: [X, 010, 010, X, 011, X, X, X]:Sort => [010, 011, 011, X, X, X, X, X]Remove dups => [010, 011, X, X, X, X, X, X]Shuffle => [010, X, 011, X, X, X, X, X]Need sort, shuffle, and trap networks
Shivkumar KalyanaramanRensselaer Polytechnic Institute
42
Sorting using MergingBuild sorters from merge networksAssume we can merge two sorted listsSort pairwise, merge, recurse
Shivkumar KalyanaramanRensselaer Polytechnic Institute
43
Non-Blocking Batcher-Banyan
3
7
5
2
6
0
1
4
7
2
3
5
6
1
0
4
7
5
2
3
1
0
6
4
7
0
5
1
3
4
2
6
7
4
5
6
0
3
1
2
7
6
4
5
3
2
0
2
7
6
5
4
3
2
1
0
000001010011100101110111
Batcher Sorter Self-Routing Network
• Fabric can be used as scheduler. •Batcher-Banyan network is blocking for multicast.
Shivkumar KalyanaramanRensselaer Polytechnic Institute
44
1.
Basic Architectural Components: Queuing, Classification
ForwardingDecision
ForwardingTable
ForwardingDecision
ForwardingTable
ForwardingDecision
ForwardingTable
OutputScheduling
3.
Interconnect2.
Shivkumar KalyanaramanRensselaer Polytechnic Institute
45
Queuing:Two basic techniques
Input Queueing Output Queueing
Usually a non-blockingswitch fabric (e.g. crossbar)
Usually a fast bus
Shivkumar KalyanaramanRensselaer Polytechnic Institute
46
Queuing:Output Queueing
Individual Output Queues Centralized Shared Memory
Memory b/w = (N+1).R
1
2
N
Memory b/w = 2N.R
1
2
N
Shivkumar KalyanaramanRensselaer Polytechnic Institute
47
Input QueueingHead of Line Blocking
Del
ay
Load58.6% 100%
Shivkumar KalyanaramanRensselaer Polytechnic Institute
48
Solution: Input Queueing w/Virtual output queues (VOQ)
Shivkumar KalyanaramanRensselaer Polytechnic Institute
49
Input QueuesVirtual Output Queues
Del
ay
Load100%
Shivkumar KalyanaramanRensselaer Polytechnic Institute
50
Packet Classification
Action
--------
---- ----
--------
Predicate ActionClassifier (Policy Database)
Packet Classification
Forwarding Engine
Incoming Packet
HEADER
Shivkumar KalyanaramanRensselaer Polytechnic Institute
51
Multi-field Packet Classification
Field 1 Field 2 … Field k A ction
Rule 1 152.163.190.69/21 152.163.80.11/32 … UDP A1
Rule 2 152.168.3.0/24 152.163.0.0/16 … TCP A2
… … … … … …
Rule N 152.168.0.0/16 152.0.0.0/8 … ANY An
Given a classifier with N rules, find the action associated with the highest priority rule matching an incoming packet.
Shivkumar KalyanaramanRensselaer Polytechnic Institute
52
Prefix matching: 1-d range problem
0 232-1
128.9/16
128.9.16.14
128.9.16/20 128.9.176/20
128.9.19/24
128.9.25/24
Most specific route = “longest matching prefix”
Shivkumar KalyanaramanRensselaer Polytechnic Institute
53
Classification: 2D Geometry problemField #1 Field #2 Data
R5 R4
R3
R2R1
R7
P2
Fiel
d #2
R6
P1
e.g. (128.16.46.23, *)e.g. (144.24/16, 64/24)
Field #1
Shivkumar KalyanaramanRensselaer Polytechnic Institute
54
Summary
High speed routers: lookup, switching, classification, buffer managementLookup: Range-matching, tries, multi-way triesSwitching: circuit s/w, crossbar, batcher-banyan, Queuing: input/output queuing issuesClassification: Multi-dimensional geometry problem
Top Related