High Performance Routing
-
Upload
tiger-gonzales -
Category
Documents
-
view
12 -
download
0
description
Transcript of High Performance Routing
![Page 1: High Performance Routing](https://reader036.fdocuments.us/reader036/viewer/2022062422/56812dab550346895d92d1ca/html5/thumbnails/1.jpg)
1
High PerformanceSwitching and RoutingTelecom Center Workshop: Sept 4, 1997.
High Performance Routing
Nick McKeownAssistant Professor of Electrical Engineering and Computer Science, Stanford University
Abrizio/PMC-Sierra Inc.
[email protected] http://www.stanford.edu/~nickm
![Page 2: High Performance Routing](https://reader036.fdocuments.us/reader036/viewer/2022062422/56812dab550346895d92d1ca/html5/thumbnails/2.jpg)
2
Outline
• Outline• Review: What is a Router?• The Evolution of Routers• Single-stage switching:
The Fork-Join Router
![Page 3: High Performance Routing](https://reader036.fdocuments.us/reader036/viewer/2022062422/56812dab550346895d92d1ca/html5/thumbnails/3.jpg)
3
Outline
• Switching is the bottleneck in a router.• The trend has been to overcome
limitations in memory bandwidth:– Shared memory -> Single-stage, crossbar-
based, combined input and output queued (CIOQ).
• …and reduce power per-rack & per-system:– Single box systems -> Multi-rack systems
(LCS).
![Page 4: High Performance Routing](https://reader036.fdocuments.us/reader036/viewer/2022062422/56812dab550346895d92d1ca/html5/thumbnails/4.jpg)
4
Outline (2)
• What comes next?• Multistage switches solve the wrong
problem:– N^2 is not the problem.– Multistage switches are more blocking,
more power-hungry and less predictable.
• Parallel single-stage switches (e.g. the Fork-Join Router) are non-blocking, use less power, can achieve as high capacity, and can be predictable.
![Page 5: High Performance Routing](https://reader036.fdocuments.us/reader036/viewer/2022062422/56812dab550346895d92d1ca/html5/thumbnails/5.jpg)
5
Outline
• Outline• Review: What is a Router?• The Evolution of Routers• Single-stage switching:
The Fork-Join Router
![Page 6: High Performance Routing](https://reader036.fdocuments.us/reader036/viewer/2022062422/56812dab550346895d92d1ca/html5/thumbnails/6.jpg)
6
Basic Architectural Components
OutputScheduling
Control Plane
Datapath”per-packet processing
SwitchingForwarding
Table
ReservationAdmission
Control Routing Table
Routing Protocols
Policing& AccessControl
PacketClassification
Ingress EgressInterconnect
1. 2. 3.
![Page 7: High Performance Routing](https://reader036.fdocuments.us/reader036/viewer/2022062422/56812dab550346895d92d1ca/html5/thumbnails/7.jpg)
7
Basic Architectural Components
Datapath: per-packet processing2. Interconnect 3. EgressForwardin
gTable
ClassifierTable
Policing &AccessControl
ForwardingDecision
1. Ingress
Forwarding
Table
ClassifierTable
Policing &AccessControl
ForwardingDecision
Forwarding
Table
ClassifierTable
Policing &AccessControl
ForwardingDecision
Limitation: Memory b/w Limitation: Interconnect b/w Power & Arbitration
Limitation: Memory b/w
![Page 8: High Performance Routing](https://reader036.fdocuments.us/reader036/viewer/2022062422/56812dab550346895d92d1ca/html5/thumbnails/8.jpg)
8
Outline
• Outline• Review: What is a Router?• The Evolution of Routers• Single-stage switching:
The Fork-Join Router
![Page 9: High Performance Routing](https://reader036.fdocuments.us/reader036/viewer/2022062422/56812dab550346895d92d1ca/html5/thumbnails/9.jpg)
9
First Generation Routers
Shared Backplane
Line Interface
CPU
Memory
RouteTableCPU Buffer
Memory
LineInterface
MAC
LineInterface
MAC
LineInterface
MAC
Fixed length “DMA” blocksor cells. Reassembled on egress
linecard
Fixed length cells or variable length packets
Typically <0.5Gb/s aggregate capacity
![Page 10: High Performance Routing](https://reader036.fdocuments.us/reader036/viewer/2022062422/56812dab550346895d92d1ca/html5/thumbnails/10.jpg)
10
Output 2
Output N
First Generation RoutersQueueing Structure: Shared Memory
Large, single dynamically allocated memory buffer:N writes per “cell” timeN reads per “cell” time.
Limited by memory bandwidth.
Input 1 Output 1
Input N
Input 2
Numerous work has proven and made possible:– Fairness– Delay Guarantees– Delay Variation Control– Loss Guarantees– Statistical Guarantees
![Page 11: High Performance Routing](https://reader036.fdocuments.us/reader036/viewer/2022062422/56812dab550346895d92d1ca/html5/thumbnails/11.jpg)
11
Second Generation Routers
RouteTableCPU
LineCard
BufferMemory
LineCard
MAC
BufferMemory
LineCard
MAC
BufferMemory
FwdingCache
FwdingCache
FwdingCache
MAC
Slow Path
Drop PolicyDrop Policy Or Backpressure
OutputLink
Scheduling
BufferMemory
Typically <5Gb/s aggregate capacity
![Page 12: High Performance Routing](https://reader036.fdocuments.us/reader036/viewer/2022062422/56812dab550346895d92d1ca/html5/thumbnails/12.jpg)
12
RouteTableCPU
Second Generation RoutersAs caching became ineffective
LineCard
BufferMemory
LineCard
MAC
BufferMemory
LineCard
MAC
BufferMemory
FwdingTable
FwdingTable
FwdingTable
MAC
ExceptionProcessor
![Page 13: High Performance Routing](https://reader036.fdocuments.us/reader036/viewer/2022062422/56812dab550346895d92d1ca/html5/thumbnails/13.jpg)
13
Second Generation RoutersQueueing Structure: Combined Input and
Output Queueing
Bus
1 write per “cell” time 1 read per “cell” timeRate of writes/reads determined by bus speed
![Page 14: High Performance Routing](https://reader036.fdocuments.us/reader036/viewer/2022062422/56812dab550346895d92d1ca/html5/thumbnails/14.jpg)
14
Third Generation Routers
LineCard
MAC
LocalBuffer
Memory
CPUCard
LineCard
MAC
LocalBuffer
Memory
Switched Backplane
Line Interface
CPUMem
ory FwdingTable
RoutingTable
FwdingTable
Typically <50Gb/s aggregate capacity
![Page 15: High Performance Routing](https://reader036.fdocuments.us/reader036/viewer/2022062422/56812dab550346895d92d1ca/html5/thumbnails/15.jpg)
15
Third Generation RoutersQueueing Structure
Switch
1 write per “cell” time 1 read per “cell” timeRate of writes/reads determined by switch
fabric speedup
![Page 16: High Performance Routing](https://reader036.fdocuments.us/reader036/viewer/2022062422/56812dab550346895d92d1ca/html5/thumbnails/16.jpg)
16
Third Generation Routers
19” or 23”
7’
• Size-constrained: 19” or 23” wide.
• Power-constrained: ~<6kW.
• QoS unfriendly: input congestion.
Supply: 100A/200A maximum at 48V
![Page 17: High Performance Routing](https://reader036.fdocuments.us/reader036/viewer/2022062422/56812dab550346895d92d1ca/html5/thumbnails/17.jpg)
17
Fourth Generation Routers/Switches
Switch Core Linecards
Optical links
Up to2km
The LCS Protocol
![Page 18: High Performance Routing](https://reader036.fdocuments.us/reader036/viewer/2022062422/56812dab550346895d92d1ca/html5/thumbnails/18.jpg)
18
Fourth Generation Routers/Switches
The LCS Protocol
What is LCS?1. Credit-based flow control: enables separation.
2. Label-based multicast: enables scaling.
Its Benefits1. Large Number of Ports.
Separation enables large number of ports in multiple racks.
2. Minimizes Switch Core Complexity and Power.Switch core can be bufferless and lossless. QoS, discard etc. performed on linecard.
![Page 19: High Performance Routing](https://reader036.fdocuments.us/reader036/viewer/2022062422/56812dab550346895d92d1ca/html5/thumbnails/19.jpg)
19
Fourth Generation Routers/Switches
Queueing Structure1 write per “cell” time 1 read per “cell” time
Rate of writes/reads determined by switch
fabric speedup
Lookup&
DropPolicy
OutputScheduling
Virtual Output Queues
OutputScheduling
OutputScheduling
SwitchFabric
SwitchArbitration
Linecard Linecard
Switch Core(Bufferless)
Lookup&
DropPolicy
Lookup&
DropPolicy
Typically <5Tb/s aggregate capacity
![Page 20: High Performance Routing](https://reader036.fdocuments.us/reader036/viewer/2022062422/56812dab550346895d92d1ca/html5/thumbnails/20.jpg)
20
Myths about CIOQ-based crossbar switches
1. “Input-queued crossbars have low throughput”– An input-queued crossbar can have as high
throughput as any switch.
2. “Crossbars don’t support multicast traffic well”– A crossbar inherently supports multicast efficiently.
3. “Crossbars don’t scale well”– Today, it is the number of chip I/Os, not the number
of crosspoints, that limits the size of a switch fabric. Expect 5Tb/s crossbar switches.
![Page 21: High Performance Routing](https://reader036.fdocuments.us/reader036/viewer/2022062422/56812dab550346895d92d1ca/html5/thumbnails/21.jpg)
21
Myths about CIOQ-based crossbar switches (2)
4. “Crossbar switches can’t support delay/QoS guarantees”
– With an internal speedup of 2, a CIOQ switch can precisely emulate a shared memory switch for all traffic.
![Page 22: High Performance Routing](https://reader036.fdocuments.us/reader036/viewer/2022062422/56812dab550346895d92d1ca/html5/thumbnails/22.jpg)
22
What makes sense today?
Shared Memory
Input Queued
CIOQ Multistage
Blocking No No No Yes
Speedup High High Small High
Emulation of SM Yes No Yes No
Multicast Good Good Good Poor
Resequencing No No No Yes
Power Low OK OK High
Packaging - OK OK Complex
![Page 23: High Performance Routing](https://reader036.fdocuments.us/reader036/viewer/2022062422/56812dab550346895d92d1ca/html5/thumbnails/23.jpg)
23
What makes sense tomorrow?
Single-stage (if possible):– Reduces complexity– Minimizes interconnect b/w – Minimizes power
![Page 24: High Performance Routing](https://reader036.fdocuments.us/reader036/viewer/2022062422/56812dab550346895d92d1ca/html5/thumbnails/24.jpg)
24
Outline
• Outline• Review: What is a Router?• The Evolution of Routers• Single-stage switching:
The Fork-Join Router
![Page 25: High Performance Routing](https://reader036.fdocuments.us/reader036/viewer/2022062422/56812dab550346895d92d1ca/html5/thumbnails/25.jpg)
25
Buffer MemoryHow Fast Can I Make a Packet Buffer?
BufferMemory
5ns SRAM
Rough Estimate:– 5ns per memory operation.– Two memory operations per
packet.– Therefore, maximum
51.2Gb/s.
– In practice, closer to 40Gb/s.
64-byte wide bus 64-byte wide bus
![Page 26: High Performance Routing](https://reader036.fdocuments.us/reader036/viewer/2022062422/56812dab550346895d92d1ca/html5/thumbnails/26.jpg)
26
Buffer MemoryIs It Going to Get Better?
time
Specmarks,Memory size,Gate density
time
MemoryBandwidth
(to core)
![Page 27: High Performance Routing](https://reader036.fdocuments.us/reader036/viewer/2022062422/56812dab550346895d92d1ca/html5/thumbnails/27.jpg)
27
Fork-Join RouterSponsored by NSF and ITRI
How can we:– Increase capacity. – Reduce power per subsystem.
While at the same time…– Keep the system simple. – Support line rates faster than memory
bandwidth. – Support guaranteed services.
Increase parallelism.
Multiple racks.
Single-stage buffering.
Pkt-by-pkt load balancing.
Hmmm….?
![Page 28: High Performance Routing](https://reader036.fdocuments.us/reader036/viewer/2022062422/56812dab550346895d92d1ca/html5/thumbnails/28.jpg)
28
The Fork-Join Router
1
2
k
1
N
rate, R
rate, R
rate, R
rate, R
1
N
Router
Bufferless
![Page 29: High Performance Routing](https://reader036.fdocuments.us/reader036/viewer/2022062422/56812dab550346895d92d1ca/html5/thumbnails/29.jpg)
29
The Fork-Join Router
• Advantages– Single-stage of buffering– kpower per subsystem – kmemory bandwidth – kfowarding table lookup rate
![Page 30: High Performance Routing](https://reader036.fdocuments.us/reader036/viewer/2022062422/56812dab550346895d92d1ca/html5/thumbnails/30.jpg)
30
The Fork-Join Router
• Questions– Switching: What is the performance?– Forwarding Lookups: How do they
work?
![Page 31: High Performance Routing](https://reader036.fdocuments.us/reader036/viewer/2022062422/56812dab550346895d92d1ca/html5/thumbnails/31.jpg)
31
A Parallel Packet Switch
1
N
rate, R
rate, R
rate, R
rate, R
1
N
OutputQueuedSwitch
OutputQueuedSwitch
OutputQueuedSwitch
1
2
k
Arriving packet tagged with egress port
![Page 32: High Performance Routing](https://reader036.fdocuments.us/reader036/viewer/2022062422/56812dab550346895d92d1ca/html5/thumbnails/32.jpg)
32
Performance Questions
1. Can it be work-conserving?2. Can it emulate a single big output
queued switch?3. Can it support delay guarantees,
strict-priorities, WFQ, …?
![Page 33: High Performance Routing](https://reader036.fdocuments.us/reader036/viewer/2022062422/56812dab550346895d92d1ca/html5/thumbnails/33.jpg)
33
Work Conservation
rate, R1rate, R
1
2
k
1
R/k
R/k
R/k
R/k
R/k
R/k
Input LinkConstraint
Output LinkConstraint
OutputQueuedSwitch
OutputQueuedSwitch
OutputQueuedSwitch
![Page 34: High Performance Routing](https://reader036.fdocuments.us/reader036/viewer/2022062422/56812dab550346895d92d1ca/html5/thumbnails/34.jpg)
34
Work Conservation
rate, R1rate, R
1
2
k
1
R/k
R/k
R/k
R/k
R/k
R/k
1
2
3 Output LinkConstraint
45
1
2
3
4
1234115
![Page 35: High Performance Routing](https://reader036.fdocuments.us/reader036/viewer/2022062422/56812dab550346895d92d1ca/html5/thumbnails/35.jpg)
35
Work Conservation
1
N
rate, R
rate, R
rate, R
rate, R
1
N
OutputQueuedSwitch
OutputQueuedSwitch
OutputQueuedSwitch
1
2
k
S(R/k)
S(R/k)
S(R/k)
S(R/k)
S(R/k)
S(R/k)
![Page 36: High Performance Routing](https://reader036.fdocuments.us/reader036/viewer/2022062422/56812dab550346895d92d1ca/html5/thumbnails/36.jpg)
36
Precise Emulation of an Output Queued Switch
N N
Output Queued Switch
1
N
Parallel Packet Switch
= ?
1
N
1
N
![Page 37: High Performance Routing](https://reader036.fdocuments.us/reader036/viewer/2022062422/56812dab550346895d92d1ca/html5/thumbnails/37.jpg)
37
Parallel Packet SwitchTheorems
1. If S > 2k/(k+2) 2 then a parallel packet switch can be work-conserving for all traffic.
2. If S > 2k/(k+2) 2 then a parallel packet switch can precisely emulate a FCFS output-queued switch for all traffic.
![Page 38: High Performance Routing](https://reader036.fdocuments.us/reader036/viewer/2022062422/56812dab550346895d92d1ca/html5/thumbnails/38.jpg)
38
Parallel Packet SwitchTheorems
3. If S > 3k/(k+3) 3 then a parallel packet switch can precisely emulate a switch with WFQ, strict priorities, and other types of QoS, for all traffic.
![Page 39: High Performance Routing](https://reader036.fdocuments.us/reader036/viewer/2022062422/56812dab550346895d92d1ca/html5/thumbnails/39.jpg)
39
Parallel Packet SwitchTheorems
4. If S > 2 then a parallel packet switch with a small co-ordination buffer at rate R, can precisely emulate a switch with WFQ, strict priorities, and other types of QoS, for all traffic.
![Page 40: High Performance Routing](https://reader036.fdocuments.us/reader036/viewer/2022062422/56812dab550346895d92d1ca/html5/thumbnails/40.jpg)
40
The Fork-Join Router
• Questions– Switching: What is the performance?– Forwarding Lookups: How do they
work?
![Page 41: High Performance Routing](https://reader036.fdocuments.us/reader036/viewer/2022062422/56812dab550346895d92d1ca/html5/thumbnails/41.jpg)
41
The Fork-Join RouterLookahead Forwarding Table Lookups
Packet tagged with egress port at next
router
Lookup performed in
parallel at rate R/k
![Page 42: High Performance Routing](https://reader036.fdocuments.us/reader036/viewer/2022062422/56812dab550346895d92d1ca/html5/thumbnails/42.jpg)
42
The Fork-Join Router
1
2
k
1
N
rate, R
rate, R
rate, R
rate, R
1
N
Router
Expect >50Tb/s aggregate capacity
![Page 43: High Performance Routing](https://reader036.fdocuments.us/reader036/viewer/2022062422/56812dab550346895d92d1ca/html5/thumbnails/43.jpg)
43
Conclusions
• The main problems are power (supply and dissipation) and memory bandwidth.
• Multi-stage switches solve the wrong problem.
• Single-stage switches are here to stay.
• Very high capacity single-stage electronic routers are feasible.