B212/MAPLD 2005 Craven1 Configurable Soft Processor Arrays Using the OpenFire Processor Stephen...

15
Craven 1 B212/MAPLD 2005 Configurable Soft Processor Arrays Using the OpenFire Processor Stephen Craven Cameron Patterson Peter Athanas Configurable Computing Lab Virginia Tech

Transcript of B212/MAPLD 2005 Craven1 Configurable Soft Processor Arrays Using the OpenFire Processor Stephen...

Page 1: B212/MAPLD 2005 Craven1 Configurable Soft Processor Arrays Using the OpenFire Processor Stephen Craven Cameron Patterson Peter Athanas Configurable Computing.

Craven 1 B212/MAPLD 2005

Configurable Soft Processor Arrays Using the OpenFire

Processor

Stephen CravenCameron Patterson

Peter Athanas

Configurable Computing LabVirginia Tech

Page 2: B212/MAPLD 2005 Craven1 Configurable Soft Processor Arrays Using the OpenFire Processor Stephen Craven Cameron Patterson Peter Athanas Configurable Computing.

Craven 2 B212/MAPLD 2005

Outline

• Motivation• Single Chip Multi-Processors • Application-Specific Instruction set Processors

• OpenFire Processor• Features and Configurability• Performance

• Configurable Array Example: Median Image Filtering• Optimizations• Performance Comparisons

Page 3: B212/MAPLD 2005 Craven1 Configurable Soft Processor Arrays Using the OpenFire Processor Stephen Craven Cameron Patterson Peter Athanas Configurable Computing.

Craven 3 B212/MAPLD 2005

Motivation: SCMP• Moving towards Single Chip Multi-Processors (SCMP)

because:• Underutilized silicon budget• Diminishing ROI on Instruction Level Parallelism • Design and verification too costly • SCMPs more energy efficient• SCMPs can leverage existing IP• SCMPs by nature are easily scalable• Fast, on-chip inter-processor communication • SCMP is fashionable (Cell, Pentium D, Athlon x2)

• Hard and soft processors in Xilinx and Altera FPGAs

Page 4: B212/MAPLD 2005 Craven1 Configurable Soft Processor Arrays Using the OpenFire Processor Stephen Craven Cameron Patterson Peter Athanas Configurable Computing.

Craven 4 B212/MAPLD 2005

Motivation: ASIP• Application-Specific Instruction set Processors (ASIP)

allow:• Optimum match of instruction set to application• Performance benefits approaching ASICs while retaining

programmability• Architectural features customized to application

• Datapath width sizing• Memory and cache hierarchy tuning

• Available commercially through Tensilica• Complete design flows and generated custom toolsets• $$$

• Academic/Research use through ASIPMeister• Closed source• GUI Only

Page 5: B212/MAPLD 2005 Craven1 Configurable Soft Processor Arrays Using the OpenFire Processor Stephen Craven Cameron Patterson Peter Athanas Configurable Computing.

Craven 5 B212/MAPLD 2005

Motivation: Configurable Arrays

• Merging SCMP with ASIP combines benefits of both:• Reduced design time utilizing existing IP• Programmability of SCMP with performance improvements

of ASIP

• FPGAs ideal platform for configurable array research and implementation• Rapid prototyping• Mature tool chains• Xilinx and Altera offer devices

with embedded processing cores (PPC and ARM)

Page 6: B212/MAPLD 2005 Craven1 Configurable Soft Processor Arrays Using the OpenFire Processor Stephen Craven Cameron Patterson Peter Athanas Configurable Computing.

Craven 6 B212/MAPLD 2005

OpenFire• Configurable 32-bit RISC processor

• Specialized for processor arrays• Instructions based on Xilinx MicroBlaze

• Uses MicroBlaze tool chain (mb-gcc, XPS, etc.)• Can execute subset of MicroBlaze code without modification• All MicroBlaze instructions supported except for division, barrel

shifting, and status register and cache related instructions• Not burdened by features unused in arrays (interrupts,

exceptions, caches, interfaces)• Open source

• Released under MIT license• Support utilities provided (C simulator, BRAM loaders, etc.)

• Differs from previously available MicroBlaze clone aeMB:• Works correctly and extensively documented

Page 7: B212/MAPLD 2005 Craven1 Configurable Soft Processor Arrays Using the OpenFire Processor Stephen Craven Cameron Patterson Peter Athanas Configurable Computing.

Craven 7 B212/MAPLD 2005

Performance• Cycle accurate with MicroBlaze except for:

• Multiply has 5 cycle latency (3 for MicroBlaze)• Single cycle instruction fetches (2 cycles for MicroBlaze)

• 100 MHz on a Xilinx Virtex II-Pro 30 speed grade 6OpenFire 641 slices 58.47 DMIPSMicroBlaze 734 slices 58.98 DMIPS*

• Performance variable depending on configuration:• 16-bit datapath implementation reduces area to 402 slices,

speed increases to 106 MHz

* Minimal MicroBlaze implementation (no OPB, division unit, barrel shifter, or cache) at 100 MHz

Page 8: B212/MAPLD 2005 Craven1 Configurable Soft Processor Arrays Using the OpenFire Processor Stephen Craven Cameron Patterson Peter Athanas Configurable Computing.

Craven 8 B212/MAPLD 2005

Extensibility• Additional instructions, including multicycle operations, can be

easily added inside ALU without affecting critical path• Potential for at least 10 new 2-operand instructions in

instruction space

RegisterFile

32x32

Mult*

Add

Bit Fns

PC

Imm

ALU

Compare

MSB

PC

Data Mem

Page 9: B212/MAPLD 2005 Craven1 Configurable Soft Processor Arrays Using the OpenFire Processor Stephen Craven Cameron Patterson Peter Athanas Configurable Computing.

Craven 9 B212/MAPLD 2005

Extensibility• OpenFire datapath customizable from 32-bits

downwards• Instructions are constant 32-bits wide• Custom datapath widths limit program size

• Program Counter is treated same as any data word• 8-bit datapath => 64 instruction program• 16-bit datapath => 16,384 instruction program

• Planned extensions include:• Increasing number of Fast Simplex Link (FSL) bus I/Os • Fast ALU-to-FSL and FSL-to-ALU operations• Additional debugging capabilities

Page 10: B212/MAPLD 2005 Craven1 Configurable Soft Processor Arrays Using the OpenFire Processor Stephen Craven Cameron Patterson Peter Athanas Configurable Computing.

Craven 10 B212/MAPLD 2005

Case Study: Image Filtering• 3x3 Median Image Filter written in C• Soft Processor Arrays created

• Master node – MicroBlaze with DDR SDRAM• Slave nodes – OpenFires connected in ring

network with master

Page 11: B212/MAPLD 2005 Craven1 Configurable Soft Processor Arrays Using the OpenFire Processor Stephen Craven Cameron Patterson Peter Athanas Configurable Computing.

Craven 11 B212/MAPLD 2005

Array Creation Process• Automated flow for array creation

• Edit DEFINE.V to set processor parameters• Create C code for master MicroBlaze and slave OpenFires

• Verification of C code available through XMD simulator and simple OpenFire C simulator

• Makefile-based flow automatically:• Creates ring network of desired size• Compiles programs and initializes BRAMs• Runs the EDK tool flow to generate a bitstream

• FSL debugging bus on the OpenFire provides observablity to the processor during operation

Page 12: B212/MAPLD 2005 Craven1 Configurable Soft Processor Arrays Using the OpenFire Processor Stephen Craven Cameron Patterson Peter Athanas Configurable Computing.

Craven 12 B212/MAPLD 2005

Array Results• Slave processor area reduced 45% by downsizing datapath to

16-bits• Required only slight modifications to original C code• Allows more OpenFires on chip, increasing throughput

• Near-linear speedup with increasing array size

1

2

3

4

5

6

7

8

1 2 3 4 5 6 7 8

Number of OpenFires

Sp

eed

up

Page 13: B212/MAPLD 2005 Craven1 Configurable Soft Processor Arrays Using the OpenFire Processor Stephen Craven Cameron Patterson Peter Athanas Configurable Computing.

Craven 13 B212/MAPLD 2005

Future Directions

• Research goal: Automated flow for creating optimized heterogeneous arrays of soft processors• Input – Parallel HLL description of application• Optimizations: datapath sizing, instruction removal /

addition, dual-issue processor cores, alu-to-network & network-to-alu operations, microcode controller, full datapath implementations

• Optimization objective: Maximize array throughput by

• Increasing individual node throughput• Reducing area to add additional nodes

Page 14: B212/MAPLD 2005 Craven1 Configurable Soft Processor Arrays Using the OpenFire Processor Stephen Craven Cameron Patterson Peter Athanas Configurable Computing.

Craven 14 B212/MAPLD 2005

Conclusion• Configurable soft processor arrays offer the best of

SCMPs and ASIPs• Simplified design• Improved performance

• OpenFire processor designed for use in processor arrays• Excellent performance / area• Highly configurable

• Datapath width adjustment can produce noticeable performance improvement

Page 15: B212/MAPLD 2005 Craven1 Configurable Soft Processor Arrays Using the OpenFire Processor Stephen Craven Cameron Patterson Peter Athanas Configurable Computing.

Craven 15 B212/MAPLD 2005

References• OpenFire source code and utilities:

http://www.ccm.ece.vt.edu/~scraven/

• James-Roxby, P., Schumacher, P., and Ross, C. “A Single Program Multiple Data Parallel Processing Platform for FPGAs,” FCCM’04