Jon Turner [email protected] arl.wustl/arl
description
Transcript of Jon Turner [email protected] arl.wustl/arl
http://www.arl.wustl.edu/arl
Extreme NetworkingAchieving Nonstop Network Operation Under Extreme Operating Conditions
DARPA PI Meeting, January 27-29, 2003
2 - Jonathan Turner – January 27-29, 2003
Project Overview Motivation
»data networks have become mission-critical resource»networks often subject to extreme traffic conditions»need to design networks for worst-case conditions» technology advances making extreme defenses
practical Extreme network services
»Lightweight Flow Setup (LFS)»Network Access Service (NAS)»Reserved Tree Service (RTS)
Key router technology components»Super-Scalable Packet Scheduling (SPS)»Dynamic Queues with Auto-aggregation (DQA)»Scalable Distributed Queueing (SDQ)
3 - Jonathan Turner – January 27-29, 2003
Switch Fabric
IPP
OP
P
FPX
SPC
Line Card
IPP
OP
P
FPX
SPC
Line Card
IPP
OP
P
FPX
SPC
Line Card
IPP
OP
P
FPX
SPC
Line Card
IPP
OP
P
FPX
SPC
Line Card
IPP
OP
PFPX
SPC
Line Card
ControlProcessor
Prototype Extreme Router
4 - Jonathan Turner – January 27-29, 2003
Switch Fabric
IPP
OP
P
FPX
SPC
Line Card
IPP
OP
P
FPX
SPC
Line Card
IPP
OP
P
FPX
SPC
Line Card
IPP
OP
P
FPX
SPC
Line Card
IPP
OP
P
FPX
SPC
Line Card
IPP
OP
PFPX
SPC
Line Card
ControlProcessor
Prototype Extreme Router
5 - Jonathan Turner – January 27-29, 2003
Switch Fabric
IPP
OP
P
FPX
SPC
Line Card
IPP
OP
P
FPX
SPC
Line Card
IPP
OP
P
FPX
SPC
Line Card
IPP
OP
P
FPX
SPC
Line Card
IPP
OP
P
FPX
SPC
Line Card
IPP
OP
PFPX
SPC
Line Card
ControlProcessor
Prototype Extreme RouterField Programmable Port Ext.
NetworkInterfaceDevice
ReprogrammableApplication
Device
SDRAM128 MB
SRAM4 MB
Field Programmable Port Extenders
ATM Switch Core
6 - Jonathan Turner – January 27-29, 2003
Switch Fabric
IPP
OP
P
FPX
SPC
Line Card
IPP
OP
P
FPX
SPC
Line Card
IPP
OP
P
FPX
SPC
Line Card
IPP
OP
P
FPX
SPC
Line Card
IPP
OP
P
FPX
SPC
Line Card
IPP
OP
PFPX
SPC
Line Card
ControlProcessor
Prototype Extreme Router
Embedded Processors
Smart Port Card 2
FlashDisk
128MB
Pentium
Cache
NorthBridge FPGA
AP
IC
7 - Jonathan Turner – January 27-29, 2003
Switch Fabric
IPP
OP
P
FPX
SPC
Line Card
IPP
OP
P
FPX
SPC
Line Card
IPP
OP
P
FPX
SPC
Line Card
IPP
OP
P
FPX
SPC
Line Card
IPP
OP
P
FPX
SPC
Line Card
IPP
OP
PFPX
SPC
Line Card
ControlProcessor
Prototype Extreme RouterGigabit Ethernet
GBIC
Framer
FPGA
8 - Jonathan Turner – January 27-29, 2003
Performance of SPC-2
0
50
100
150
200
250
300
350
0 200 400 600 800 1000 1200 1400Packet Size
Forw
ardin
g (M
b/s)
SPC-1
SPC-2
2.5x improvement for average packet lengths
0
50,000
100,000
150,000
200,000
250,000
0 200 400 600 800 1,000 1,200 1,400Packet Size
Forw
ardin
g (pkts
/s)
SPC-1
SPC-2
2.6x improvement for average packet lengths
Largest gain at small packet sizes. PCI bus
limits performance for
large packets
9 - Jonathan Turner – January 27-29, 2003
More SPC-2 Performance
1500 Byte Packets
0
50
100
150
200
250
300
350
0 50 100 150 200 250 300 350 400 450 500
Input Rate (Mb/ s)
Forw
ardin
g R
ate
(Mb/s)
SPC-1
SPC-222K packets/ sec
15K packets/ sec
32 Byte Packets
0
20
40
60
80
100
0 50 100 150 200 250Input Rate (Mb/ s)
Forw
ardin
g R
ate
(Mb/
s)
SPC-2
SPC-1
220K packets/ sec
78K packets/ sec Throughput loss at high loads due
to PCI bus contention and input priority.
10 - Jonathan Turner – January 27-29, 2003
Field Programmable Port Extender (FPX)
Network Interface Device (NID) routes cells to/from RAD.
Reprogrammable Application Device (RAD) functions:»will implement core router
functions in extensible router»may also implement arbitrary
packet processing functions Functions for extreme
router.»high speed packet storage
manager»packet classification & route
lookup– fast route lookup– exact match filters– 32 general filters
»flexible queue manager– per-flow queues for reserved
flows– route packets to/from SPC
NetworkInterfaceDevice
ReprogrammableApp. Device
(400 Kg+80 KB)
SDRAM(64 MB)
SRAM(1 MB)
2 Gb/sinterface
2 Gb/sinterface
6.4 Gb/s100
MH
z
SRAM(1 MB)
SDRAM(64 MB)
64
36
36
64
100
MH
z
11 - Jonathan Turner – January 27-29, 2003
Logical Port Architecture
Input Side Processing
special flow queues
...
...
...
virtual output queues
...
PCU
plugins
SPC
FPX
reassembly contexts
DQ
PacketClassification
& RouteLookup
Output Side Processing
special flow queues
......
...
output queues
...
PCU
plugins
SPC
FPX
reassembly contexts
RC
PacketClassification
12 - Jonathan Turner – January 27-29, 2003
FPX Packet Processor Block Diagram
Control
Data Path
Control Cell Processor
DQ Status &Rate Control
Route &Filter
Updates
Register SetUpdates & Status
SRAMSRAM Register Set
Queue Manager
ISAR
SDRAM SDRAM
from SWfrom LC
to SW to LC
OSARPacket Storage Manager(includes free space list)
HeaderPointer
DiscardPointer
HeaderProc.
Classification andRoute Lookup
13 - Jonathan Turner – January 27-29, 2003
Classification and Route Lookup (CARL) Three lookup engines.
» route lookup for routing datagrams - best prefix
» flow filters for multicast & reserved flows - exact
» general filters (32) for management - exhaustive
Input processing.» parallel check of all three» return highest priority
exclusive and highest priority non-exclusive
» general filters have unique priority
» all flow filters share single priority
» ditto for routes
RouteLookup
GeneralFilters
FlowFilters
Inp
ut
Dem
ux
Resu
lt P
roc.
&
Pri
ori
ty R
eso
luti
on
bypass
headers
Output processing.» no route lookup on output
Route lookup & flow filters share off-chip SRAM
General filters processed on-chip
14 - Jonathan Turner – January 27-29, 2003
on-chip SRAM
1 1 tag+data--0 11 00 01 1
. .
..
. .
tag+data
tag+data--
tag+data--
off-chip SRAM
Exact Match Lookup Exact match lookup table used for reserved flows.
» includes LFS, signaled QOS flows and multicast»and, flows requiring processing by SPCs»each of these flows has separate queue in QM»multicast flows have two queues (recycling multicast)» implemented using hashing
srcdstpacket
56
simple hash
ingress valid
egress valid
tag =[src,dst,sport, dport,proto]
data includes• 2 outputs+2 QIDs• LFS rates• packet,byte
counters• flags
separate memory areas for ingress
and egress packets
15 - Jonathan Turner – January 27-29, 2003
General Filter Match General filter match considers full 5-tuple
»prefix match on source and destination addresses»range match on source and destination ports»exact or wildcard match on protocol»each filter has a priority and may be exclusive or
non-exclusive Intended primarily for management filters.
»firewall filters»class-based monitoring»class-based special processing
Implemented using parallel exhaustive search.» limit of 32 filters matcher
filtermemory
matcher
matcher
matcher
16 - Jonathan Turner – January 27-29, 2003
Fast IP Lookup (Eatherton & Dittia)
Multibit trie with clever dataencoding.» small memory requirements (<7 bytes per prefix)» small memory bandwidth, simple lookup yields fast lookup rates» updates have negligible impact on lookup performance
Avoid impact of external memory latency on throughput by interleaving several concurrent lookups.» 8 lookup engine config. uses about 6% of Virtex 2000E logic cells
address: 101 100 101 00001,10
000 001010100 101 110
011110 110 100101100
* 010,00 1,11 000
11 -- 1 *--1,10
0 00 010000000000
0 10 100000000000
0 10 000000000000
0 01 000100000000
0 00 011011101110
0 00 000000001000
0 00 000100010010
0 00 000000000010
0 01 000000001100
1 00 000000000000
0 01 001000000000
1 00 000000000000
0 00 100000000000
internalbit vector external
bit vector
17 - Jonathan Turner – January 27-29, 2003
0
1
2
3
4
56
7
8
9
10
11
12
1 2 3 4 5 6 7 8Number of Lookup engines
Million
s of
look
ups
per
seco
nd
Mae-West - Single Tree
Mae-West - Split Tree
Worst-Case - Single Tree
SRAM Bandwidth – 450 MB/s
Lookup Throughput
linearthroughput
gain
Split tree cuts storage by
30%
18 - Jonathan Turner – January 27-29, 2003
0
1
2
3
4
5
6
7
8
9
10
11
12
1 2 3 4 5 6 7 8
Number of Lookup Engines
Million
s of
look
ups
per
seco
nd
106 updates/ second
105 updates/ second
No updates - Single Tree
Update Performance
reasonable update rates have little
impact
1 update per s
19 - Jonathan Turner – January 27-29, 2003
Queue Manager Logical View (QM)
arriving packets
...
SPC pkt. sched.
to SPC
from SPC
...
res. flow queues
VO
Q p
kt.
sched
.
datagram queue
to output 1
to output 8
to output 0
DQ
to switch
datagram queues
...
link p
kt.
sch
ed. ...
to link
res. flow queues
64 hashed datagram queues for
traffic isolation
separate queues for
each reserved flow
separate queue set for each output.
separate queue for each SPC
flow
20 - Jonathan Turner – January 27-29, 2003
Backlogged TCP Flows with Tail Discard
0
5,000
10,000
15,000
20,000
50 100 150 200Time (seconds)
Buff
er
Level
tail drop, 20K buffer, 100 ms RTT,100 sources, 500 Mb/ s link
0
200
400
600
800
1,000
50 51 52 53 54 55 56 57 58 59 60Time (seconds)
Buff
er
Level
tail drop, 1K buffer, 100 ms RTT,100 sources, 500 Mb/ s link
with large buffers get large delay
variance
with small buffers get
underflow and low throughput
21 - Jonathan Turner – January 27-29, 2003
DRR with Discard from Longest Queue
Smaller fluctuations, but still significant.
0
5,000
10,000
15,000
20,000
50 100 150 200Time (seconds)
Buff
er
Level
DRR, 20K buffer, 100 ms RTT,100 sources, 500 Mb/ s link
0
200
400
600
800
1,000
50 51 52 53 54 55 56 57 58 59 60
Time (seconds)
Buff
er
Level
DRR, 1K buffer, 100 ms RTT,100 sources, 500 Mb/ s link
22 - Jonathan Turner – January 27-29, 2003
900
920
940
960
980
1,000
50 51 52 53 54 55 56 57 58 59 60
Time (seconds)
Buff
er
Level
QSDRR, 1K buffer, 100 ms RTT,100 sources, 500 Mb/ s link
Queue State DRR Add hysteresis to packet discard policy
» discard from same queue until shortest non-empty queue.
19,000
19,200
19,400
19,600
19,800
20,000
50 100 150 200Time (seconds)
Buff
er
Level
QSDRR, 20K buffer, 100 ms RTT,100 sources, 500 Mb/ s link
low variation, even with small queues,
low delay, no tuning
23 - Jonathan Turner – January 27-29, 2003
Packet Scheduling with Approx. Radix Sorting
To implement virtual time schedulers, need to quickly find the queue whose “lead packet” has the smallest virtual finish time. »using priority queue, this requires O (log n) time for n queues
Use approximate radix sorting, with compensation – O (1).» timing wheels with increasing granularity and range»approximate sorting produces inter-packet timing errors»observe errors & compensate when next packet scheduled
Fast-forward bits used to skip to empty slots. Scheduler puts no limit on number of queues. Two copies of data structure needed for approx. version of WF2Q+.
wheel 1 wheel 2 wheel 3
output list
fast forward bits 00110100 10000010 00101010
24 - Jonathan Turner – January 27-29, 2003
Resource Usage Estimates Key resources in Xilinx FPGAs
»flip flops - 38,400» lookup tables (LUTs) - 38,400
each can implement any 4 input Boolean function
»block RAMs (4 Kbits each) - 160
flops LUTs RAMs flops LUTs RAMsCARL 3,781 5,199 28 9.8% 13.5% 17.5%CCP 1,500 750 5 3.9% 2.0% 3.1%FIFOs 159 340 12 0.4% 0.9% 7.5%ISAR 3,674 5,053 28 9.6% 13.2% 17.5%OSAR 3,795 3,208 22 9.9% 8.4% 13.8%PSM 6,196 5,746 20 16.1% 15.0% 12.5%QM 5,605 6,472 14 14.6% 16.9% 8.8%Total 24,710 26,768 129 64.3% 69.7% 80.6%Resources 38,400 38,400 160
% of totalNumber
25 - Jonathan Turner – January 27-29, 2003
0
5
10
15
20
25
0 0 1 2 3 4 5 6 7 8
Number of Logic Levels (LUTs)
Del
ay (ns
)
XCV2000e-6XCV1000e-7
max separation
min
sep
arat
ion
FPGA Performance Characteristics
26 - Jonathan Turner – January 27-29, 2003
Summary Version 1 Hardware status.
»hardware operating in lab, passing packets»but, still have some bugs to correct»one day for typical test-diagnose-correction cycle»version 1 has simplified queue manager
Planning several system demos in next month.»system level throughput testing – focus on lookup proc.»verifying basic fair queueing behavior»TCP SYN attack suppressor
SPC-resident plugin monitors new TCP connections going to server when too many “half-open” connections, oldest are reset flow filters inserted for stable connections, enabling hw forwarding
Expect to complete version 2 hardware in next six months.