CprE / ComS 583 Reconfigurable Computing

40
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #8 – Reconfigurable Networking

description

CprE / ComS 583 Reconfigurable Computing. Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #8 – Reconfigurable Networking. Recap – Genetic Pattern Matching. Comparing strings by edit distance Motivation: The Human Genome Project - PowerPoint PPT Presentation

Transcript of CprE / ComS 583 Reconfigurable Computing

Page 1: CprE / ComS 583 Reconfigurable Computing

CprE / ComS 583Reconfigurable Computing

Prof. Joseph ZambrenoDepartment of Electrical and Computer EngineeringIowa State University

Lecture #8 – Reconfigurable Networking

Page 2: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 13, 2007 Lect-08.2

Recap – Genetic Pattern Matching

• Comparing strings by edit distance• Motivation: The Human Genome Project

• Do two genetic strings match?• How are they related?

• When biologists characterize a new sequence, they want to compare it to the (growing) database of known sequences

• Abstraction:• What is the cost of transforming s into t • Given – costs for insertion, deletion, substitution

Page 3: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 13, 2007 Lect-08.3

Alphabet and Costs

• Alphabet• Letters in the string. For DNA, there are four:

• A (Adenine)• C (Cytosine)• T (Thymine)• G (Guanine)

• Transformation Costs• Insert:1, Delete:1, Substitute:2, match:0

• Type of comparison• One target to many sources• One target to one source

Page 4: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 13, 2007 Lect-08.4

Substitution Example

baboon

CostMoveWord

bab|on

bobon

boubon

bourbon

Delete ‘o’

Substitute ‘o’

Insert ‘u’

Insert ‘r’

Match?

Total cost:

1

2

1

1

0

5bourbon

Page 5: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 13, 2007 Lect-08.5

Dynamic Programming Solution

• Source sequence: s1, s2, … sm

• Target sequence: t1, t2, … tn

• di,j = distance between subsequence s1, s2, …si and subsequence t1, t2, …tj, where

d0,0 = 0

di,0 = di-1,0 + Delete(si)

d0,j = d0,j-1 + Insert(tj)

di-1,j + Delete(si)

di,j = min di,j-1 + Insert(tj)

di-1,j-1 + Substitute(si, tj)• Distance(Source, Target) = dm,n

Page 6: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 13, 2007 Lect-08.6

Dynamic Programming Example

baboon

0123456

b1012234

o2122223

u3233334

r4344345

b5344455

o6455456

n7566565

Page 7: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 13, 2007 Lect-08.7

Parallelism on the Anti-Diagonal

baboon

0123456

b1012234

o2122223

u3233334

r4344345

b5344455

o6455456

n7566565 di,j

di-1,j

di,j-1

di-1,j-1

Page 8: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 13, 2007 Lect-08.8

SCin

SDin

TCout

TDout

If (SCin != 0) and (TCin != 0)

PEDist + Substitute(SCin, TCin)

PEDist min TDin + Delete(SCin)

SDin + Insert(TCin)

else-if (SCin != 0)

PEDist SDin

else-if (TCin != 0)

PEDist TDin

endif

SCout SCin

TCout TCin

SDout PEDist

TDout PEDist

Bidirectional PE Example

BidirPE

PEDist

BidirPE

PEDist

BidirPE

PEDist

SCout

SDout

TCin

TDin

Page 9: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 13, 2007 Lect-08.9

Bidirectional Summary

• 16 CLBs/PE• 384 PEs/Board• 2,100 Million Cells/sec• Requires 2*(m+n) PEs• Uses only half the processors at any one time• Must stream both source and target for each

comparison• Makes comparison against large DB impractical

Page 10: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 13, 2007 Lect-08.10

Genetic Search Performance

• Nearly linear scaling in cell updates per second (CUPS)

• Need to reuse array for large patterns

Hardware CUPS

Splash 2 x16 43,000M

Splash 2 3,000M

Splash 1 370M

P-NAC (34) 500M

λ

0.60μ

0.60μ

0.60μ

2.0μ

CM-2 (64K) 150M

CM-5 (32) 33M

SPARC 10 1.2M

SPARC 1 0.87M

?

?

0.40μ

0.75μ

Area

500Mλ2x17x16

500Mλ2x16

420Mλ2x32

7.8Mλ2x34

1.6GMλ2

273Mλ2

CUP/λ2s

0.32

0.38

0.028

1.9

0.00075

0.0032

Page 11: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 13, 2007 Lect-08.11

Outline

• Recap – Pattern Matching on Splash-2• The Field-Programmable Port Extender (FPX) • FPX Architecture• FPX Programming Model• FPX Applications

• Pattern Matching• Packet Classification• Rule Processing

Page 12: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 13, 2007 Lect-08.12

Application – Network Processing

• Networking applications well-suited for reconfigurable hardware• Target signatures change often• Massive quantities of stream-based data• Repetitive operations

• Connecting up to a realistic networking environment is hard• Washington University experimental setup one of the

best• Shows importance of both memory and processing

capability• Numerous experiments performed over the past five

years

Page 13: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 13, 2007 Lect-08.13

Network Routing with the FPX

• FPX Modules distributed across each port of a switch • IP packets (over ATM) enter and depart line card• Packet fragments processed by modules• Advantages:

• New protocols implemented directly in silicon• Easy to upgrade in the field

IP Packets

IP Packets

IPP

IPP OPP

OPP

CardOC3/OC12/OC48

Line FPX

ExtenderPort

programmableField-

FPX

ExtenderPort

Field-programmable

CardOC3/OC12/OC48

Line

SwitchFabric

Gigabit

IPP

IPP

OPP

OPP

Page 14: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 13, 2007 Lect-08.14

FPX Hardware Device

Page 15: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 13, 2007 Lect-08.15

FPX Hardware in a WUGS-20 Switch

Page 16: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 13, 2007 Lect-08.16

FPGA-based Router

• FPX module contains two FPGAs• NID – network interface device

• Performs data queuing• RAD – reprogrammable application device

• Specialized control sequences

OSC100MHz10 MHz

OSC

SDRAM

(backside)

8Mbit ZBT

SRAM

8Mbit ZBT

SRAM

DeviceInterfaceNetworkNIDRAD

ReprogrammableApplication

Device

Virtex1000E fg680

RAD/NID Status

SDRAM

(backside)

SRAMProgram

RAD

Reprog

OC

3 / O

C12

/ O

C48

Lin

ecar

d C

onne

ctor

EPROM

NID

62.5 MHz

VR

M:

1.8

V S

witc

her

JTAG

WU

GS

Sw

itch

Bac

kpla

ne C

onne

ctor

FPXWashU / ARL

July, 2000JL / MR

OSC

Page 17: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 13, 2007 Lect-08.17

Reprogrammable Application Device

• Spatial Re-use of FPGA Resources• Modules implemented using FPGA logic• Module logic can be individually reprogrammed

• Shared Access to off-chip resources• Memory Interfaces to SRAM and SDRAM• Common Datapath to send and receive data

Mo

du

le

Mo

du

le

Network Interfaces to NID

RAD

SRAMData

SRAM

SDRAM SDRAM

Data Data

Data

Page 18: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 13, 2007 Lect-08.18

Architecture of the FPX

• RAD• Large Xilinx FPGA• Attaches to SRAM and

SDRAM• Reprogrammable over

network• Provides two user-defined

Module Interfaces

• NID • Provides Utopia

Interfaces between switch & line card

• Forwards cells to RAD• Programs RAD

SRAM

EC

Mo

du

le

EC

NID

Switch LineCard

Mo

du

le

RAD

VC VC

VCVC

RAD

Program

SDRAM SDRAM

Data Data

SRAM

Data

SRAM

Data

Page 19: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 13, 2007 Lect-08.19

Architecture of the FPX (cont.)

PR

OM

Cache

Prog

ram

Flo

wB

uffer

Ro

ute

Filter

Exte

ns

ible

Mo

du

les

Layered Protocol Wrappers

Switch

SD

RA

MS

RA

M

SD

RA

MS

RA

M

Co

nfig

NID (FPGA)

Memory

Network Interface

RAD (FPGA)

FPX Block Diagram FPX Photo

PR

OM

Cache

Prog

ramC

acheP

rogram

Flo

wB

uffer

Ro

ute

Filter

Exte

ns

ible

Mo

du

les

Layered Protocol Wrappers

Switch

SD

RA

MS

RA

M

SD

RA

MS

RA

M

Co

nfig

NID (FPGA)

Memory

Network Interface

RAD (FPGA)

FPX Block Diagram FPX Photo

Page 20: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 13, 2007 Lect-08.20

FPX SRAM

• Provide low latency for fast table-lookups• Zero Bus Turnaround (ZBT) allows back-to-back

read / write operations every 10ns• Dual, Independent Memories • 36-bit wide bus

Page 21: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 13, 2007 Lect-08.21

FPX SDRAM

• Dual, independent SDRAM memories• 64-bit wide, 100 MHz• 64Mb / Module : 128 Mb total [expandable]• Burst-based transactions [1-8 word transfers]• Latency of 14 cycles to Read/Write 8-word burst

Page 22: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 13, 2007 Lect-08.22

Routing Traffic Flows

• Functions• Check packets for errors• Process commands

• Control, status, & reprogramming

• Implement per-flow forwarding

NID

LineCardSwitch

EC ECEC

VC VC

VC VC

VC

ccp

ccp

• Traffic flows routed among• Switch• Line Card• RAD.Switch• RAD.Linecard

Page 23: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 13, 2007 Lect-08.23

Typical Flow Configurations

VC

EC

VC

ccpEC

VC VC

RADSwitch

RADLineCard

LineCardSwitchDefault Flow Action

(Bypass)

VC

EC

VC

ccpEC

VC VC

RADSwitch

RADLineCard

LineCardSwitchIngress Processing

(IP Routing)

VC

EC

VC

ccpEC

VC VC

RADSwitch

RADLineCard

LineCardSwitchFull RAD Processing

(Packet Routing and Reassembly)

VC

EC

VC

ccpEC

VC VC

RADSwitch

RADLineCard

LineCardSwitch

(Per-flow Output Queueing)Egress Processing

VC

EC

VC

ccpEC

VC VC

LineCardSwitch

RADSwitch

RADLineCard

Full Loopback Testing(System Test)

VC

EC

VC

ccpEC

VC VC

RADSwitch

RADLineCard

LineCardSwitchPartial Loopback Testing(Egress Flow Processing Test)

Page 24: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 13, 2007 Lect-08.24

Reprogramming Logic

• NID programs at boot from EPROM• Switch Controller writes RAD configuration memory to NID

• Configuration file for RAD arrives transmitted over network via control cells

• Switch Controller issues {Full/Partial} reconfigure command• NID reads RAD config memory to program RAD

• Performs complete or partial reprogramming of RAD

R A DReprogrammable

ApplicationDevice

Virtex1000E fg680

S D R A M

(ba cks id e)

8Mb

it Z

BT

SR

AM

8Mb

it Z

BT

SR

AM

S D R A M

(ba cks id e)

O SC6 2.5 M H z

O SC1 00 M H z

DeviceInterfaceNetworkN ID

R ADP rogram

FIFO E PR O MN ID

WU

GS

Sw

itch

Bac

kpla

ne C

onn

ect

or

OC

3 /

OC

12

/ OC

48 L

inec

ard

Con

nect

or

(backside)

V R M

V R M

1.8V

2.5V(backside)

P C B Trace D e ns ity

Switch Element

IPPIPPIPPIPPIPPIPPIPP

OPPOPPOPPOPPOPPOPPOPPOPPIPP

LCLCLCLCLCLCLCLC

LCLCLCLCLCLCLCLC

Page 25: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 13, 2007 Lect-08.25

FPX Interfaces Provides

• Well defined Interface• Utopia-like 32-bit fast data interface• Flow control allows back-pressure

• Flow Routing• Arbitrary permutations of packet flows through ports

• Dynamically Reprogrammable• Other modules continue to operate even while new

module is being reprogrammed

• Memory Access• Shared access to SRAM and SDRAM• Request/Grant protocol

Page 26: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 13, 2007 Lect-08.26

Pattern Matching using the FPX

• Use Hardware to detect a pattern in data• Modify packet based on match• Pipeline operation to maximize throughput

Page 27: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 13, 2007 Lect-08.27

“Hello, World” Module Function

Page 28: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 13, 2007 Lect-08.28

Logical Implementation

VCI Match

Append“WORLD”to payload

NewCell

Page 29: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 13, 2007 Lect-08.29

The Wrapper Concept

App

Wrapper

Wrapper

Page 30: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 13, 2007 Lect-08.30

AAL5 Encapsulation

ATM Header

Padding

AAL5 Trailer

Payload

0

0

1

CRC-32

options length

Payload

• Payload is packed in cells

• Padding may be added

• 64 bit Trailer at end of cell

• Trailer contains CRC-32

• Last Cell indication bit (last bit of PTI field)

Page 31: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 13, 2007 Lect-08.31

HelloBob Module

UDP Processor

Echo

UDP Hello Bob

IP Processor

Cell Processor

Frame Processor

SRAMInterface

OutputInput

Page 32: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 13, 2007 Lect-08.32

Results: Performance • Operating Frequency: 119 MHz.

• 8.4ns critical path• Well within the 10ns period RAD's clock.• Targeted to RAD’s V1000E-FG680-7

• Maximum packet processing rate:• 7.1 Million packets per second.

• (100 MHz)/(14 Clocks/Cell)• Circuit handles back-to-back packets

• Slice utilization: • 0.4% (49/12,288 slices)

• Less than one half of one percent of chip resources

• Search technique can be adapted for other types of data matching and modification

• Regular expressions• Parsing image content …

Page 33: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 13, 2007 Lect-08.33

CAM-based Packet Matching

• Sample Packet:• Source Address = 128.252.5.5 (dotted.decimal)

• Destination Address = 141.142.2.2 (dotted.decimal)

• Source Port = 4096 (decimal)

• Destination Port = 50 (decimal)

• Protocol = TCP (6)

• Payload = “Consolidate your loans. CALL NOW”• Payload Lists = { General SPAM (0), Save Money SPAM (1) }• Content Vector = “00000011” (binary) = x”03” (hex)

7103 3971

Src IP (hex) =80FC0505

Dest IP (hex) =8D8E0202

SrcPort = 1000

Dest Port =0050

Proto= 06

084072

Con-tent= 03

111 104

Page 34: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 13, 2007 Lect-08.34

Sample Filter• Source Address = 128.252.0.0 / 16 • Destination Address = 141.142.0.0 / 16• Source Port = Don’t Care• Destination Port = 50• Protocol = TCP (6)• Payload includes general SPAM (List 0)

7103 3971

Src IP (hex) =80FC0505

Dest IP (hex) =8D8E0202

SrcPort = 1000

Dest Port =0050

Proto= 06

084072

Src IP value =80FC0000

Dest IP (hex) =8D8E0000

SrcPort = 0000

Dest Port =

50

Proto= 06

Src IP (hex) =FFFF0000

Dest IP (hex) =FFFF0000

SrcPort = 0000

Dest Port =FFFF

Proto= FF

Value

Mask: 1=care0=don’t care

IP Packet

Con-ten t=

01

Con-ten t=

01

Con-tent=

03

DROP the packet : It matches the filter

Page 35: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 13, 2007 Lect-08.35

Packet Classifier with FlowIDCAM MASK [1]

CAM VALUE [1]

CAM MASK [2]

CAM VALUE [2]

CAM MASK [3]

CAM VALUE [3]

CAM MASK [N]

CAM VALUE [N]

Flow ID [1]112 bits

Flow ID [2]

Flow ID [3]

Flow ID [N]

Flow ID

. . .. . .

. . .

16 bits

Value Comparators

Mask Matchers

Priority Encoder

Resulting Flow

Identifier

Flow List

Source Address Destination Address

16 bits

Payload Match Bits

Source Port

Dest.Port

Protocol

- - CAM Table - -

Bits in IP Header

Page 36: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 13, 2007 Lect-08.36

Fast IP Lookup Algorithm

• Function• Search for best matching prefix using

Trie algorithm

Prefix Next Hop*

01*47

10* 2110* 90001* 11011* 000110* 501011* 3

0 1

0

0 0

0

0

0

1 1

1

1 1

1

1

1

1

Page 37: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 13, 2007 Lect-08.37

Hardware Implementation in the FPX

SRAM1

SRAM2

IP LookupEngine

counter

On-Chip Cell Store

SRAM1 Interface

Control CellProcessor

PacketReassembler

RAD FPGA

NID FPGA

ExtractIP Headers

Remap VCIsfor IP packets

LC SW

RequestGrant0 1

0

0 0

0

0

0

1 1

1

1 1

1

1

1

1

Page 38: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 13, 2007 Lect-08.38

Time (cycles)

Generate Address

Pipelined FIPL Operations

• Throughput : Optimized by interleaving memory accesses• Operate 5 parallel lookups• t_pipelined_lookup = 550ns / 5 = 110 ns • Throughput = 9.1 Million packets / second

Latch ADDRinto SRAM

SRAMD < M[A]

Latch Datainto FPGA

Compute

Time (cycles)

Space

(Parallel lookup units on FPGA)

Generate Address

Page 39: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 13, 2007 Lect-08.39

Other Modules Implemented

• IPv4 CAM Filter• 104 Bit header matching

• Fast IP Lookup (FIPL)• Longest Prefix Match• MAE-West at 10M

pkts/second

• Packet Content Scanner• Reg. Expression Search

• Data Queueing• Per-flow queue in

SDRAM

• IPv6 Tunneling Module• Tunnels IPv6 over IPv4

• Statistics Module• Event counter

• Traffic Generator• Per-flow mixing

• Video Recoder• Motion JPEG

• Embedded Processor• KCPSM

Page 40: CprE / ComS 583 Reconfigurable Computing

CprE 583 – Reconfigurable ComputingSeptember 13, 2007 Lect-08.40

Summary

• Field Programmable Port Extender (FPX)• Network-accessible Hardware• Reprogrammable Application Device

• Module Deployment• Modules implement fast processing on data flow• Network allows Arbitrary Topologies of distributed systems

• Project Website• http://www.arl.wustl.edu/arl/projects/fpx/