Jon Turner [email protected] arl.wustl/arl

26
Jon Turner [email protected] http://www.arl.wustl.edu/arl Extreme Networking Achieving Nonstop Network Operation Under Extreme Operating Conditions DARPA PI Meeting, January 27-29, 2003

description

Extreme Networking Achieving Nonstop Network Operation Under Extreme Operating Conditions DARPA PI Meeting, January 27-29, 2003. Jon Turner [email protected] http://www.arl.wustl.edu/arl. Project Overview. Motivation data networks have become mission-critical resource - PowerPoint PPT Presentation

Transcript of Jon Turner [email protected] arl.wustl/arl

Page 1: Jon Turner jst@cse.wustl arl.wustl/arl

Jon [email protected]

http://www.arl.wustl.edu/arl

Extreme NetworkingAchieving Nonstop Network Operation Under Extreme Operating Conditions

DARPA PI Meeting, January 27-29, 2003

Page 2: Jon Turner jst@cse.wustl arl.wustl/arl

2 - Jonathan Turner – January 27-29, 2003

Project Overview Motivation

»data networks have become mission-critical resource»networks often subject to extreme traffic conditions»need to design networks for worst-case conditions» technology advances making extreme defenses

practical Extreme network services

»Lightweight Flow Setup (LFS)»Network Access Service (NAS)»Reserved Tree Service (RTS)

Key router technology components»Super-Scalable Packet Scheduling (SPS)»Dynamic Queues with Auto-aggregation (DQA)»Scalable Distributed Queueing (SDQ)

Page 3: Jon Turner jst@cse.wustl arl.wustl/arl

3 - Jonathan Turner – January 27-29, 2003

Switch Fabric

IPP

OP

P

FPX

SPC

Line Card

IPP

OP

P

FPX

SPC

Line Card

IPP

OP

P

FPX

SPC

Line Card

IPP

OP

P

FPX

SPC

Line Card

IPP

OP

P

FPX

SPC

Line Card

IPP

OP

PFPX

SPC

Line Card

ControlProcessor

Prototype Extreme Router

Page 4: Jon Turner jst@cse.wustl arl.wustl/arl

4 - Jonathan Turner – January 27-29, 2003

Switch Fabric

IPP

OP

P

FPX

SPC

Line Card

IPP

OP

P

FPX

SPC

Line Card

IPP

OP

P

FPX

SPC

Line Card

IPP

OP

P

FPX

SPC

Line Card

IPP

OP

P

FPX

SPC

Line Card

IPP

OP

PFPX

SPC

Line Card

ControlProcessor

Prototype Extreme Router

Page 5: Jon Turner jst@cse.wustl arl.wustl/arl

5 - Jonathan Turner – January 27-29, 2003

Switch Fabric

IPP

OP

P

FPX

SPC

Line Card

IPP

OP

P

FPX

SPC

Line Card

IPP

OP

P

FPX

SPC

Line Card

IPP

OP

P

FPX

SPC

Line Card

IPP

OP

P

FPX

SPC

Line Card

IPP

OP

PFPX

SPC

Line Card

ControlProcessor

Prototype Extreme RouterField Programmable Port Ext.

NetworkInterfaceDevice

ReprogrammableApplication

Device

SDRAM128 MB

SRAM4 MB

Field Programmable Port Extenders

ATM Switch Core

Page 6: Jon Turner jst@cse.wustl arl.wustl/arl

6 - Jonathan Turner – January 27-29, 2003

Switch Fabric

IPP

OP

P

FPX

SPC

Line Card

IPP

OP

P

FPX

SPC

Line Card

IPP

OP

P

FPX

SPC

Line Card

IPP

OP

P

FPX

SPC

Line Card

IPP

OP

P

FPX

SPC

Line Card

IPP

OP

PFPX

SPC

Line Card

ControlProcessor

Prototype Extreme Router

Embedded Processors

Smart Port Card 2

FlashDisk

128MB

Pentium

Cache

NorthBridge FPGA

AP

IC

Page 7: Jon Turner jst@cse.wustl arl.wustl/arl

7 - Jonathan Turner – January 27-29, 2003

Switch Fabric

IPP

OP

P

FPX

SPC

Line Card

IPP

OP

P

FPX

SPC

Line Card

IPP

OP

P

FPX

SPC

Line Card

IPP

OP

P

FPX

SPC

Line Card

IPP

OP

P

FPX

SPC

Line Card

IPP

OP

PFPX

SPC

Line Card

ControlProcessor

Prototype Extreme RouterGigabit Ethernet

GBIC

Framer

FPGA

Page 8: Jon Turner jst@cse.wustl arl.wustl/arl

8 - Jonathan Turner – January 27-29, 2003

Performance of SPC-2

0

50

100

150

200

250

300

350

0 200 400 600 800 1000 1200 1400Packet Size

Forw

ardin

g (M

b/s)

SPC-1

SPC-2

2.5x improvement for average packet lengths

0

50,000

100,000

150,000

200,000

250,000

0 200 400 600 800 1,000 1,200 1,400Packet Size

Forw

ardin

g (pkts

/s)

SPC-1

SPC-2

2.6x improvement for average packet lengths

Largest gain at small packet sizes. PCI bus

limits performance for

large packets

Page 9: Jon Turner jst@cse.wustl arl.wustl/arl

9 - Jonathan Turner – January 27-29, 2003

More SPC-2 Performance

1500 Byte Packets

0

50

100

150

200

250

300

350

0 50 100 150 200 250 300 350 400 450 500

Input Rate (Mb/ s)

Forw

ardin

g R

ate

(Mb/s)

SPC-1

SPC-222K packets/ sec

15K packets/ sec

32 Byte Packets

0

20

40

60

80

100

0 50 100 150 200 250Input Rate (Mb/ s)

Forw

ardin

g R

ate

(Mb/

s)

SPC-2

SPC-1

220K packets/ sec

78K packets/ sec Throughput loss at high loads due

to PCI bus contention and input priority.

Page 10: Jon Turner jst@cse.wustl arl.wustl/arl

10 - Jonathan Turner – January 27-29, 2003

Field Programmable Port Extender (FPX)

Network Interface Device (NID) routes cells to/from RAD.

Reprogrammable Application Device (RAD) functions:»will implement core router

functions in extensible router»may also implement arbitrary

packet processing functions Functions for extreme

router.»high speed packet storage

manager»packet classification & route

lookup– fast route lookup– exact match filters– 32 general filters

»flexible queue manager– per-flow queues for reserved

flows– route packets to/from SPC

NetworkInterfaceDevice

ReprogrammableApp. Device

(400 Kg+80 KB)

SDRAM(64 MB)

SRAM(1 MB)

2 Gb/sinterface

2 Gb/sinterface

6.4 Gb/s100

MH

z

SRAM(1 MB)

SDRAM(64 MB)

64

36

36

64

100

MH

z

Page 11: Jon Turner jst@cse.wustl arl.wustl/arl

11 - Jonathan Turner – January 27-29, 2003

Logical Port Architecture

Input Side Processing

special flow queues

...

...

...

virtual output queues

...

PCU

plugins

SPC

FPX

reassembly contexts

DQ

PacketClassification

& RouteLookup

Output Side Processing

special flow queues

......

...

output queues

...

PCU

plugins

SPC

FPX

reassembly contexts

RC

PacketClassification

Page 12: Jon Turner jst@cse.wustl arl.wustl/arl

12 - Jonathan Turner – January 27-29, 2003

FPX Packet Processor Block Diagram

Control

Data Path

Control Cell Processor

DQ Status &Rate Control

Route &Filter

Updates

Register SetUpdates & Status

SRAMSRAM Register Set

Queue Manager

ISAR

SDRAM SDRAM

from SWfrom LC

to SW to LC

OSARPacket Storage Manager(includes free space list)

HeaderPointer

DiscardPointer

HeaderProc.

Classification andRoute Lookup

Page 13: Jon Turner jst@cse.wustl arl.wustl/arl

13 - Jonathan Turner – January 27-29, 2003

Classification and Route Lookup (CARL) Three lookup engines.

» route lookup for routing datagrams - best prefix

» flow filters for multicast & reserved flows - exact

» general filters (32) for management - exhaustive

Input processing.» parallel check of all three» return highest priority

exclusive and highest priority non-exclusive

» general filters have unique priority

» all flow filters share single priority

» ditto for routes

RouteLookup

GeneralFilters

FlowFilters

Inp

ut

Dem

ux

Resu

lt P

roc.

&

Pri

ori

ty R

eso

luti

on

bypass

headers

Output processing.» no route lookup on output

Route lookup & flow filters share off-chip SRAM

General filters processed on-chip

Page 14: Jon Turner jst@cse.wustl arl.wustl/arl

14 - Jonathan Turner – January 27-29, 2003

on-chip SRAM

1 1 tag+data--0 11 00 01 1

. .

..

. .

tag+data

tag+data--

tag+data--

off-chip SRAM

Exact Match Lookup Exact match lookup table used for reserved flows.

» includes LFS, signaled QOS flows and multicast»and, flows requiring processing by SPCs»each of these flows has separate queue in QM»multicast flows have two queues (recycling multicast)» implemented using hashing

srcdstpacket

56

simple hash

ingress valid

egress valid

tag =[src,dst,sport, dport,proto]

data includes• 2 outputs+2 QIDs• LFS rates• packet,byte

counters• flags

separate memory areas for ingress

and egress packets

Page 15: Jon Turner jst@cse.wustl arl.wustl/arl

15 - Jonathan Turner – January 27-29, 2003

General Filter Match General filter match considers full 5-tuple

»prefix match on source and destination addresses»range match on source and destination ports»exact or wildcard match on protocol»each filter has a priority and may be exclusive or

non-exclusive Intended primarily for management filters.

»firewall filters»class-based monitoring»class-based special processing

Implemented using parallel exhaustive search.» limit of 32 filters matcher

filtermemory

matcher

matcher

matcher

Page 16: Jon Turner jst@cse.wustl arl.wustl/arl

16 - Jonathan Turner – January 27-29, 2003

Fast IP Lookup (Eatherton & Dittia)

Multibit trie with clever dataencoding.» small memory requirements (<7 bytes per prefix)» small memory bandwidth, simple lookup yields fast lookup rates» updates have negligible impact on lookup performance

Avoid impact of external memory latency on throughput by interleaving several concurrent lookups.» 8 lookup engine config. uses about 6% of Virtex 2000E logic cells

address: 101 100 101 00001,10

000 001010100 101 110

011110 110 100101100

* 010,00 1,11 000

11 -- 1 *--1,10

0 00 010000000000

0 10 100000000000

0 10 000000000000

0 01 000100000000

0 00 011011101110

0 00 000000001000

0 00 000100010010

0 00 000000000010

0 01 000000001100

1 00 000000000000

0 01 001000000000

1 00 000000000000

0 00 100000000000

internalbit vector external

bit vector

Page 17: Jon Turner jst@cse.wustl arl.wustl/arl

17 - Jonathan Turner – January 27-29, 2003

0

1

2

3

4

56

7

8

9

10

11

12

1 2 3 4 5 6 7 8Number of Lookup engines

Million

s of

look

ups

per

seco

nd

Mae-West - Single Tree

Mae-West - Split Tree

Worst-Case - Single Tree

SRAM Bandwidth – 450 MB/s

Lookup Throughput

linearthroughput

gain

Split tree cuts storage by

30%

Page 18: Jon Turner jst@cse.wustl arl.wustl/arl

18 - Jonathan Turner – January 27-29, 2003

0

1

2

3

4

5

6

7

8

9

10

11

12

1 2 3 4 5 6 7 8

Number of Lookup Engines

Million

s of

look

ups

per

seco

nd

106 updates/ second

105 updates/ second

No updates - Single Tree

Update Performance

reasonable update rates have little

impact

1 update per s

Page 19: Jon Turner jst@cse.wustl arl.wustl/arl

19 - Jonathan Turner – January 27-29, 2003

Queue Manager Logical View (QM)

arriving packets

...

SPC pkt. sched.

to SPC

from SPC

...

res. flow queues

VO

Q p

kt.

sched

.

datagram queue

to output 1

to output 8

to output 0

DQ

to switch

datagram queues

...

link p

kt.

sch

ed. ...

to link

res. flow queues

64 hashed datagram queues for

traffic isolation

separate queues for

each reserved flow

separate queue set for each output.

separate queue for each SPC

flow

Page 20: Jon Turner jst@cse.wustl arl.wustl/arl

20 - Jonathan Turner – January 27-29, 2003

Backlogged TCP Flows with Tail Discard

0

5,000

10,000

15,000

20,000

50 100 150 200Time (seconds)

Buff

er

Level

tail drop, 20K buffer, 100 ms RTT,100 sources, 500 Mb/ s link

0

200

400

600

800

1,000

50 51 52 53 54 55 56 57 58 59 60Time (seconds)

Buff

er

Level

tail drop, 1K buffer, 100 ms RTT,100 sources, 500 Mb/ s link

with large buffers get large delay

variance

with small buffers get

underflow and low throughput

Page 21: Jon Turner jst@cse.wustl arl.wustl/arl

21 - Jonathan Turner – January 27-29, 2003

DRR with Discard from Longest Queue

Smaller fluctuations, but still significant.

0

5,000

10,000

15,000

20,000

50 100 150 200Time (seconds)

Buff

er

Level

DRR, 20K buffer, 100 ms RTT,100 sources, 500 Mb/ s link

0

200

400

600

800

1,000

50 51 52 53 54 55 56 57 58 59 60

Time (seconds)

Buff

er

Level

DRR, 1K buffer, 100 ms RTT,100 sources, 500 Mb/ s link

Page 22: Jon Turner jst@cse.wustl arl.wustl/arl

22 - Jonathan Turner – January 27-29, 2003

900

920

940

960

980

1,000

50 51 52 53 54 55 56 57 58 59 60

Time (seconds)

Buff

er

Level

QSDRR, 1K buffer, 100 ms RTT,100 sources, 500 Mb/ s link

Queue State DRR Add hysteresis to packet discard policy

» discard from same queue until shortest non-empty queue.

19,000

19,200

19,400

19,600

19,800

20,000

50 100 150 200Time (seconds)

Buff

er

Level

QSDRR, 20K buffer, 100 ms RTT,100 sources, 500 Mb/ s link

low variation, even with small queues,

low delay, no tuning

Page 23: Jon Turner jst@cse.wustl arl.wustl/arl

23 - Jonathan Turner – January 27-29, 2003

Packet Scheduling with Approx. Radix Sorting

To implement virtual time schedulers, need to quickly find the queue whose “lead packet” has the smallest virtual finish time. »using priority queue, this requires O (log n) time for n queues

Use approximate radix sorting, with compensation – O (1).» timing wheels with increasing granularity and range»approximate sorting produces inter-packet timing errors»observe errors & compensate when next packet scheduled

Fast-forward bits used to skip to empty slots. Scheduler puts no limit on number of queues. Two copies of data structure needed for approx. version of WF2Q+.

wheel 1 wheel 2 wheel 3

output list

fast forward bits 00110100 10000010 00101010

Page 24: Jon Turner jst@cse.wustl arl.wustl/arl

24 - Jonathan Turner – January 27-29, 2003

Resource Usage Estimates Key resources in Xilinx FPGAs

»flip flops - 38,400» lookup tables (LUTs) - 38,400

each can implement any 4 input Boolean function

»block RAMs (4 Kbits each) - 160

flops LUTs RAMs flops LUTs RAMsCARL 3,781 5,199 28 9.8% 13.5% 17.5%CCP 1,500 750 5 3.9% 2.0% 3.1%FIFOs 159 340 12 0.4% 0.9% 7.5%ISAR 3,674 5,053 28 9.6% 13.2% 17.5%OSAR 3,795 3,208 22 9.9% 8.4% 13.8%PSM 6,196 5,746 20 16.1% 15.0% 12.5%QM 5,605 6,472 14 14.6% 16.9% 8.8%Total 24,710 26,768 129 64.3% 69.7% 80.6%Resources 38,400 38,400 160

% of totalNumber

Page 25: Jon Turner jst@cse.wustl arl.wustl/arl

25 - Jonathan Turner – January 27-29, 2003

0

5

10

15

20

25

0 0 1 2 3 4 5 6 7 8

Number of Logic Levels (LUTs)

Del

ay (ns

)

XCV2000e-6XCV1000e-7

max separation

min

sep

arat

ion

FPGA Performance Characteristics

Page 26: Jon Turner jst@cse.wustl arl.wustl/arl

26 - Jonathan Turner – January 27-29, 2003

Summary Version 1 Hardware status.

»hardware operating in lab, passing packets»but, still have some bugs to correct»one day for typical test-diagnose-correction cycle»version 1 has simplified queue manager

Planning several system demos in next month.»system level throughput testing – focus on lookup proc.»verifying basic fair queueing behavior»TCP SYN attack suppressor

SPC-resident plugin monitors new TCP connections going to server when too many “half-open” connections, oldest are reset flow filters inserted for stable connections, enabling hw forwarding

Expect to complete version 2 hardware in next six months.