Post on 12-Jan-2016
INF5050 – Protocols and Routing in Internet (Friday 6.2.2015)
Presented by Tor Skeie
Subject: IP-router architecture
Nick McKeown 2
High PerformanceSwitching and RoutingTelecom Center Workshop: Sept 4, 1997.
This presentation is based on slidesfrom Nick McKeown, with updates
Nick McKeownProfessor of Electrical Engineering and Computer Science, Stanford University
nickm@stanford.eduwww.stanford.edu/~nickm
Stanford High Performance Networking group: http://klamath.stanford.edu
Nick McKeown 3
Outline
Background What is a router? Why do we need faster routers? Why are they hard to build?
Architectures and techniques The evolution of router architecture. IP address lookup. Packet buffering. Switching.
The Future
Nick McKeown 4
What is Routing?
R3
A
B
C
R1
R2
R4 D
E
FR5
R5F
R3E
R3D
Next Hop
Destination
DD
Nick McKeown 5
What is Routing?
R3
A
B
C
R1
R2
R4 D
E
FR5
R5F
R3E
R3D
Next Hop
Destination
D
DDD
16 3241
Data
Options (if any)
Destination Address
Source Address
Header ChecksumProtocolTTL
Fragment OffsetFlags
Fragment ID
Total Packet LengthT.ServiceHLen
Ver
20
byte
s
Nick McKeown 6
Points of Presence (POPs)
A
B
C
POP1
POP3POP2
POP4 D
E
F
POP5
POP6 POP7POP8
Nick McKeown 7
(140 Gb/s)
Where High Performance Routers are Used
R10 R11
R4
R13
R9
R5
R2R1 R6
R3 R7
R12
R16R15
R14
R8
(2.5 Gb/s)
(140 Gb/s)
(140 Gb/s)
(140 Gb/s)
Nick McKeown 8
What a Router Looks Like
Cisco CRS-3(CRS-1 16 slot single-shelf on picture)
Juniper M320 (M160 on picture)
2.14m
0.60m
0.91m
Capacity: 4.48Tb/sPower: 12.3kWWeight: 723kg
0.88m
0.65m
0.44m
Capacity: 160Gb/sPower: 3.5kW
Capacity is sum of rates of
linecards
Nick McKeown 9
A fully configured CRS-3 has the capacity of 322 Tb/s
"The Cisco CRS-3 triples the capacity of its predecessor, the Cisco CRS-1 Carrier Routing System, with up to 322 Terabits per second, which enables the entire printed collection of the Library of Congress to be downloaded in just over one second; every man, woman and child in China to make a video call, simultaneously; and every motion picture ever created to be streamed in less than four minutes.”
Alcatel 7670 RSP Juniper TX8/T640
TX8
ChiaroAvici TSR
Some Multi-rack Routers
Nick McKeown 11
Generic Router Architecture
LookupIP Address
UpdateHeader
Header ProcessingData Hdr Data Hdr
1M prefixesOff-chip DRAM
AddressTable
AddressTable
IP Address Next Hop
QueuePacket
BufferMemoryBuffer
Memory1M packetsOff-chip DRAM
Nick McKeown 12
Generic Router Architecture
LookupIP Address
UpdateHeader
Header Processing
AddressTable
AddressTable
LookupIP Address
UpdateHeader
Header Processing
AddressTable
AddressTable
LookupIP Address
UpdateHeader
Header Processing
AddressTable
AddressTable
Data Hdr
Data Hdr
Data Hdr
BufferManager
BufferMemory
BufferMemory
BufferManager
BufferMemory
BufferMemory
BufferManager
BufferMemory
BufferMemory
Data Hdr
Data Hdr
Data Hdr
Nick McKeown 13
Why do we Need Faster Routers?
1. To prevent routers becoming the bottleneck in the Internet.
2. To increase POP capacity, and to reduce cost,
1. size and 2. power.
Nick McKeown 14
0,1
1
10
100
1000
10000
1985 1990 1995 2000
Spe
c95I
nt C
PU
res
ults
Why we Need Faster Routers 1: To prevent routers from being the
bottleneck
0,1
1
10
100
1000
10000
1985 1990 1995 2000
Fib
er
Ca
pa
cit
y (
Gb
it/s
)
TDM DWDM
Packet processing Power Link Speed
2x / 18 months 2x / 7 months
Source: SPEC95Int & David Miller, Stanford.
Nick McKeown 15
POP with large routersPOP with smaller routers
Why we Need Faster Routers
2: To reduce cost, power & complexity of POPs
Ports: Price >$100k, Power > 400W. It is common for 50-60% of ports to be for interconnection.
Nick McKeown 16
Why are Fast Routers Difficult to Make?
1. It’s hard to keep up with Moore’s Law:
The bottleneck is memory speed. Memory speed is not keeping up
with Moore’s Law.
Nick McKeown 17
1. It’s hard to keep up with Moore’s Law:
The bottleneck is memory speed. Memory speed is not keeping up with
Moore’s Law.
Why are Fast Routers Difficult to Make?
Speed of Commercial DRAM
Moore’s Law2x / 18 months
1.1x / 18 months1.1x / 18 months
Moore’s Law2x / 18 months
Nick McKeown 18
Why are Fast Routers Difficult to Make?
1. It’s hard to keep up with Moore’s Law:
The bottleneck is memory speed. Memory speed is not keeping up
with Moore’s Law. 2. Moore’s Law is too slow:
Routers need to improve faster than Moore’s Law.
Nick McKeown 19
Router Performance Exceeds Moore’s Law
Growth in capacity of commercial routers: Capacity 1992 ~ 2Gb/s Capacity 1995 ~ 10Gb/s Capacity 1998 ~ 40Gb/s Capacity 2001 ~ 160Gb/s Capacity 2003 ~ 640Gb/s Capacity 2008 ~ 100Tb/s Capacity 2013 ~ 920Tb/s
Average growth rate: 2.2x / 18 months, but the last 5 years: 2.8x / months.
2013:The Cisco CRS-X multishelf router has a capacity of 921.6Tb/s (1152 ports)
Nick McKeown 20
Outline
Background What is a router? Why do we need faster routers? Why are they hard to build?
Architectures and techniques The evolution of router architecture. IP address lookup. Packet buffering. Switching.
The Future
Nick McKeown 21
RouteTableCPU Buffer
Memory
LineInterface
MAC
LineInterface
MAC
LineInterface
MAC
Typically <0.5Gb/s aggregate capacity
First Generation Routers
Shared Backplane
Line Interface
CPU
Memory
Nick McKeown 22
Second Generation RoutersRouteTableCPU
LineCard
BufferMemory
LineCard
MAC
BufferMemory
LineCard
MAC
BufferMemory
FwdingCache
FwdingCache
FwdingCache
MAC
BufferMemory
Typically <5Gb/s aggregate capacity
Nick McKeown 23
Third Generation Routers
LineCard
MAC
LocalBuffer
Memory
CPUCard
LineCard
MAC
LocalBuffer
Memory
Switched Backplane
Line Interface
CPUMem
ory FwdingTable
RoutingTable
FwdingTable
Typically <50Gb/s aggregate capacity
Nick McKeown 24
Fourth Generation Routers/Switches
Optics inside a router for the first time
Switch Core Linecards
Optical links
100sof metres
100-1000 Tb/s routers available/in development
Nick McKeown 25
Outline
Background What is a router? Why do we need faster routers? Why are they hard to build?
Architectures and techniques The evolution of router architecture. IP address lookup. Packet buffering. Switching.
The Future
Nick McKeown 26
Generic Router Architecture
LookupIP Address
UpdateHeader
Header Processing
AddressTable
AddressTable
LookupIP Address
UpdateHeader
Header Processing
AddressTable
AddressTable
LookupIP Address
UpdateHeader
Header Processing
AddressTable
AddressTable
BufferManager
BufferMemory
BufferMemory
BufferManager
BufferMemory
BufferMemory
BufferManager
BufferMemory
BufferMemory
LookupIP Address
AddressTable
AddressTable
LookupIP Address
AddressTable
AddressTable
LookupIP Address
AddressTable
AddressTable
Nick McKeown 27
IP Address Lookup
Why it’s thought to be hard:1. It’s not an exact match: it’s a longest prefix
match. 2. The table is large: about 550,000 entries
today, and growing. 3. The lookup must be fast: about 2ns for a
140Gb/s line.
Nick McKeown 28
Longest Prefix Match is Harder than Exact Match
• The destination address of an arriving packet does not carry with it the information to determine the length of the longest matching prefix
• Hence, one needs to search among the space of all prefix lengths; as well as the space of all prefixes of a given length
Nick McKeown 29
IP Lookups find Longest Prefixes
128.9.16.0/21 128.9.172.0/21
128.9.176.0/24
0 232-1
128.9.0.0/16142.12.0.0/1965.0.0.0/8
128.9.16.14
Routing lookup: Find the longest matching prefix (aka the most specific route) among all prefixes that match the destination address.
Nick McKeown 30
Address Tables are Large
Nick McKeown 31
Lookups Must be Fast
Nick McKeown 32
IP Address LookupBinary tries
Example Prefixes:a) 00001b) 00010c) 00011d) 001e) 0101f) 011g) 100h) 1010i) 1100j) 11110000
e
f g
h i
j
0 1
a b c
d
0
0
1
1
Nick McKeown 33
Multi-bit Tries
Depth = WDegree = 2Stride = 1 bit
Binary trieW
Depth = W/kDegree = 2k
Stride = k bits
Multi-ary trie
W/k
Time ~ W/kStorage ~ NW/k * 2k-1
W = longest prefixN = #prefixes
Nick McKeown 34
Prefix Length Distribution
99.5% prefixes are 24-bits or shorter
0
10000
20000
30000
40000
50000
60000
70000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Prefix Length
Num
ber
of P
refi
xes
Source: Geoff Huston, Oct 2001
Nick McKeown 35
24-8 Direct Lookup Trie
0000……0000 1111……1111
0 224-1
24 bits
8 bits
0 28-1
When pipelined, allows one lookup per memory access. Inefficient use of memory, though.
Nick McKeown 38
Outline
Background What is a router? Why do we need faster routers? Why are they hard to build?
Architectures and techniques The evolution of router architecture. IP address lookup. Packet buffering. Switching.
The Future
Nick McKeown 40
Conceptual architecture
Line cardshosting
one or moreports
Non-blockingswitchingcore(s)
Arbitration/Control
Bi-
dir
ect
ional p
ort
s Bi-d
irectio
nal p
orts
Nick McKeown 41
Conceptual Packet Buffering
Non-blockingswitchingcore(s)
Arbitration/Control
Bi-
dir
ect
ional p
ort
sB
i-dire
ctional p
orts
Input buffering
Nick McKeown 43
Arbitration
LookupIP Address
UpdateHeader
Header Processing
AddressTable
AddressTable
LookupIP Address
UpdateHeader
Header Processing
AddressTable
AddressTable
LookupIP Address
UpdateHeader
Header Processing
AddressTable
AddressTable
QueuePacket
BufferMemory
BufferMemory
QueuePacket
BufferMemory
BufferMemory
QueuePacket
BufferMemory
BufferMemory
Data Hdr
Data Hdr
Data Hdr
1
2
N
1
2
N
Data Hdr
Data Hdr
Data Hdr
Arbitration
Nick McKeown 44
Head of Line Blocking
Nick McKeown 45
0% 20% 40% 60% 80% 100%Load
Del
ayA Router with Input Queues
Head of Line Blocking
The best that any queueing system can
achieve.
2 2 58%
Nick McKeown 46
0% 20% 40% 60% 80% 100%Load
Del
ayThe Best Performance
The best that any queueing system can
achieve.
Nick McKeown 47
Conceptual Packet Buffering
Non-blockingswitchingcore(s)
Arbitration/Control
Bi-
dir
ect
ional p
ort
sB
i-dire
ctional p
orts
Central buffer
Nick McKeown 48
Fast Packet Buffers(http://yuba.stanford.edu/fastbuffers/)
Example: 40Gb/s packet bufferSize = RTT*BW = 10Gb; 40 byte packets
Write Rate, R
1 packetevery 8 ns
Read Rate, R
1 packetevery 8 ns
BufferManager
BufferMemory
Use SRAM?+ fast enough random access time, but
- too low density to store 10Gb of data.
Use SRAM?+ fast enough random access time, but
- too low density to store 10Gb of data.
Use DRAM?+ high density means we can store data, but- too slow (40ns random access time).
Use DRAM?+ high density means we can store data, but- too slow (40ns random access time).
Nick McKeown 49
DRAM Buffer Memory
Packet Caches
Buffer Manager
SRAM
ArrivingPackets
DepartingPackets12
Q
21234
345
123456
Small ingress SRAM cache of FIFO headscache of FIFO tails
5556
9697
8788
57585960
899091
1
Q
2
Small ingress SRAM
1
57 6810 9
79 81011
1214 1315
5052 515354
8688 878991 90
8284 838586
9294 9395 68 7911 10
1
Q
2
DRAM Buffer Memory
b>>1 packets at a time
Nick McKeown 50
Conceptual Packet Buffering
Non-blockingswitchingcore(s)
Arbitration/Control
Bi-
dir
ect
ional p
ort
sB
i-dire
ctional p
orts
Output buffering
Nick McKeown 51
Output buffering
LookupIP Address
UpdateHeader
Header Processing
AddressTable
AddressTable
LookupIP Address
UpdateHeader
Header Processing
AddressTable
AddressTable
LookupIP Address
UpdateHeader
Header Processing
AddressTable
AddressTable
Data Hdr
Data Hdr
Data Hdr
BufferManager
BufferMemory
BufferMemory
BufferManager
BufferMemory
BufferMemory
BufferManager
BufferMemory
BufferMemory
Data Hdr
Data Hdr
Data Hdr
Nick McKeown 54
Speed-up
LookupIP Address
UpdateHeader
Header Processing
AddressTable
AddressTable
LookupIP Address
UpdateHeader
Header Processing
AddressTable
AddressTable
LookupIP Address
UpdateHeader
Header Processing
AddressTable
AddressTable
QueuePacket
BufferMemory
BufferMemory
QueuePacket
BufferMemory
BufferMemory
QueuePacket
BufferMemory
BufferMemory
Data Hdr
Data Hdr
Data Hdr
1
2
N
1
2
N
N times line rate
N times line rate
Nick McKeown 55
Conceptual Packet Buffering
Non-blockingswitchingcore(s)
Arbitration/Control
Bi-
dir
ect
ional p
ort
sB
i-dire
ctional p
orts
Input buffering with a virtual output queue
Nick McKeown 56
Virtual Output Queues
Nick McKeown 57
Matching
A matching on a graph is a subset of edges of the graph such that no two of them share a vertex in common.
edge
vertex
Nick McKeown 59
0% 20% 40% 60% 80% 100%Load
Del
ayA Router with Virtual Output
Queues
The best that any queueing system can
achieve.
Nick McKeown 67
Current Internet Router Technology
Summary
There are three potential bottlenecks: Address lookup, Packet buffering, and Switching.
Techniques exist today for: 100+Tb/s Internet routers, with 140Gb/s linecards.
But what comes next…?
Nick McKeown 68
Outline
Background What is a router? Why do we need faster routers? Why are they hard to build?
Architectures and techniques The evolution of router architecture. IP address lookup. Packet buffering. Switching.
The Future More parallelism. Eliminating schedulers. Introducing optics into routers.
The Future
Nick McKeown 72
Complex linecards
PhysicalLayer
Framing&
Maintenance
PacketProcessing
Buffer Mgmt&
Scheduling
Buffer Mgmt&
Scheduling
Buffer & StateMemory
Buffer & StateMemory
Typical IP Router Linecard
10Gb/s linecard: Number of gates: 30M Amount of memory: 2Gbits Cost: >$20k Power: 300W
LookupTables
SwitchFabric
Arbitration
Optics
Nick McKeown 73
External Parallelism: Multiple Parallel Routers
What we’d like:R
R R
R
The building blocks we’d like to use:
R
RR
R
NxN
IP Router capacity 100s of Tb/s
Nick McKeown 74
Multiple parallel routers Load Balancing
R R
R
1
2
…
…
k
R
R
R
R/k R/k
R
R
R
Nick McKeown 75
Intelligent Packet Load-balancing
Parallel Packet Switching
1
2
k
1
N
rate, R
rate, R
rate, R
rate, R
1
N
Router
Bufferless
R/k R/kDemultiplexor Multiplexor
Nick McKeown 76
Parallel Packet Switching
AdvantagesSingle-stage of bufferingNo excess link capacitykpower per subsystem kmemory bandwidth klookup rate
Nick McKeown 77
Parallel Packet Switching
AdvantagesLoad-balancing: output links are less
congested Scalability: new router can be
dynamically addedRedundancy
Nick McKeown 78
Parallel Packet SwitchTheorem
If Speed-up > 2k/(k+2) 2 then a parallel packet switch can precisely emulate a single big router.
Nick McKeown 80
Eliminating schedulersTwo-Stage Switch [Chang et al., 2001]
1
N
1
N
1
N
External Outputs
Internal Inputs
External Inputs
First Round-Robin Second Round-Robin
Load Balancing
Nick McKeown 83
Optics in routers
Switch Core Linecards
Optical links
Nick McKeown 84
Replacing the switch fabric with optics
Candidate technologies 1. MEMs.
2. Fast tunable lasers + passive optical couplers.
3. Diffraction waveguides.4. Electroholographic materials.
Nick McKeown 85
160Gb/s
40Gb/s
40Gb/s
40Gb/s
40Gb/s
Optical2-stageSwitch
• Line termination
• IP packet processing
• Packet buffering
• Line termination
• IP packet processing
• Packet buffering
160-320Gb/s
160-320Gb/s
Linecard #1 Linecard #625
100 Tb/s IP Router, 625 linecards, each operating at 160Gb/s.
The Stanford Phicticious Optical Router
Nick McKeown 90
References
General1. J. S. Turner “Design of a Broadcast packet switching network
”, IEEE Trans Comm, June 1988, pp. 734-743.2. C. Partridge et al. “A Fifty Gigabit per second IP Router”, IEEE
Trans Networking, 1998.3. N. McKeown, M. Izzard, A. Mekkittikul, W. Ellersick, M.
Horowitz, “The Tiny Tera: A Packet Switch Core”, IEEE Micro Magazine, Jan-Feb 1997.
Fast Packet Buffers1. Sundar Iyer, Ramana Rao, Nick McKeown “Design of a fast
packet buffer”, IEEE HPSR 2001, Dallas.
Nick McKeown 91
ReferencesIP Lookups1. A. Brodnik, S. Carlsson, M. Degermark, S. Pink. “Small
Forwarding Tables for Fast Routing Lookups”, Sigcomm 1997, pp 3-14.
2. B. Lampson, V. Srinivasan, G. Varghese. “ IP lookups using multiway and multicolumn search”, Infocom 1998, pp 1248-56, vol. 3.
3. M. Waldvogel, G. Varghese, J. Turner, B. Plattner. “Scalable high speed IP routing lookups”, Sigcomm 1997, pp 25-36.
4. P. Gupta, S. Lin, N. McKeown. “Routing lookups in hardware at memory access speeds”, Infocom 1998, pp 1241-1248, vol. 3.
5. S. Nilsson, G. Karlsson. “Fast address lookup for Internet routers”, IFIP Intl Conf on Broadband Communications, Stuttgart, Germany, April 1-3, 1998.
6. V. Srinivasan, G.Varghese. “Fast IP lookups using controlled prefix expansion”, Sigmetrics, June 1998.
Nick McKeown 92
ReferencesSwitching• N. McKeown, A. Mekkittikul, V. Anantharam, and J. Walrand. Achieving
100% Throughput in an Input-Queued Switch. IEEE Transactions on Communications, 47(8), Aug 1999.
• A. Mekkittikul and N. W. McKeown, "A practical algorithm to achieve 100% throughput in input-queued switches," in Proceedings of IEEE INFOCOM '98, March 1998.
• L. Tassiulas, “Linear complexity algorithms for maximum throughput in radio networks and input queued switchs,” in Proc. IEEE INFOCOM ‘98, San Francisco CA, April 1998.
• D. Shah, P. Giaccone and B. Prabhakar, “An efficient randomized algorithm for input-queued switch scheduling,” in Proc. Hot Interconnects 2001.
• J. Dai and B. Prabhakar, "The throughput of data switches with and without speedup," in Proceedings of IEEE INFOCOM '00, Tel Aviv, Israel, March 2000, pp. 556 -- 564.
• C.-S. Chang, D.-S. Lee, Y.-S. Jou, “Load balanced Birkhoff-von Neumann
switches,” Proceedings of IEEE HPSR ‘01, May 2001, Dallas, Texas.
Nick McKeown 93
ReferencesFuture• C.-S. Chang, D.-S. Lee, Y.-S. Jou, “Load balanced
Birkhoff-von Neumann switches,” Proceedings of IEEE HPSR ‘01, May 2001, Dallas, Texas.
• Pablo Molinero-Fernndez, Nick McKeown "TCP Switching: Exposing circuits to IP" Hot Interconnects IX, Stanford University, August 2001
• S. Iyer, N. McKeown, "Making parallel packet switches practical," in Proc. IEEE INFOCOM `01, April 2001, Alaska.
• I. Keslassy et al. ”Scaling Internet Routers Using Optics” in. Proc. ACM SIGCOMM `03, August 2003, Germany.