Application-Specific Customization of FPGA Soft- core Processors Journal Paper Presentation...

27
Application-Specific Application-Specific Customization of FPGA Customization of FPGA Soft-core Processors Soft-core Processors Journal Paper Presentation Journal Paper Presentation Presented by: Ahmad Sghaier Course Instructor: Dr. Shawki Areibi Course: ENGG 6090*6 – Winter07 Date: Apr. 5 th , 2007

Transcript of Application-Specific Customization of FPGA Soft- core Processors Journal Paper Presentation...

Application-Specific Application-Specific Customization of FPGA Soft-Customization of FPGA Soft-

core Processorscore Processors

Journal Paper PresentationJournal Paper Presentation

Presented by: Ahmad Sghaier Course Instructor: Dr. Shawki Areibi

Course: ENGG 6090*6 – Winter07 Date: Apr. 5th, 2007

OutlinesOutlines

Introduction.Introduction.

Parameterized Soft-cores.Parameterized Soft-cores.

Micro-architectural Trade-offs and ISA Micro-architectural Trade-offs and ISA Sub-setting.Sub-setting.

Fast Application-specific Customization.Fast Application-specific Customization.

Conclusion.Conclusion.

ResourcesResources

P. Yiannacouras, J. Steffan and J. Rose, “Exploration and P. Yiannacouras, J. Steffan and J. Rose, “Exploration and Customization of FPGA-Based Soft Processors” in IEEE Customization of FPGA-Based Soft Processors” in IEEE Transactions on Computer-aided Design of integrated Circuits Transactions on Computer-aided Design of integrated Circuits and Systems, Vol. 26, NO. 2, Feb. 2007.and Systems, Vol. 26, NO. 2, Feb. 2007.

D. Sheldon, R. Kumar, R. Lysecky, F. Vahid and D. Tullsen, D. Sheldon, R. Kumar, R. Lysecky, F. Vahid and D. Tullsen, “Application-Specific Customization of Parameterized FPGA “Application-Specific Customization of Parameterized FPGA Soft-Core Processors” in IEEE/ACM Int. Conf. on Computer-Soft-Core Processors” in IEEE/ACM Int. Conf. on Computer-Aided Deisgn, Nov. 2006.Aided Deisgn, Nov. 2006.

Soft-core vs. Hard-coreSoft-core vs. Hard-coreA hard-core processor is laid out on the chip next to the A hard-core processor is laid out on the chip next to the FPGA’s configurable logic fabric FPGA’s configurable logic fabric A soft-core processor is synthesized onto the FPGA’s fabric, A soft-core processor is synthesized onto the FPGA’s fabric, just like any other circuit. just like any other circuit. soft-core processors advantages:soft-core processors advantages: Utilizing standard mass-producedUtilizing standard mass-produced Enabling a custom number of microprocessorsEnabling a custom number of microprocessors

Soft-core processors disadvantages:Soft-core processors disadvantages: Reduced processor performanceReduced processor performance Higher power consumptionHigher power consumption Larger size. Larger size.

Commercial Soft-coresCommercial Soft-coresXilinx MicroBlazeXilinx MicroBlaze

A 32-bit soft-core processor. A 32-bit soft-core processor. A single-issue in order execution processor. A single-issue in order execution processor. Configurable to five components: multiplier, barrel shifter, Configurable to five components: multiplier, barrel shifter,

divider, floating-point unit (FPU), and data cache. divider, floating-point unit (FPU), and data cache.

Altera Altera Nios II. It has three mostly unparameterized variations:Nios II. It has three mostly unparameterized variations: Nios IINios II//e, a small unpipelined 6 cycles per instruction (CPI) e, a small unpipelined 6 cycles per instruction (CPI)

processor with serial shifter and software multiplication;processor with serial shifter and software multiplication; Nios IINios II//s, a five-stage pipeline with multiplier-based barrel shifter, s, a five-stage pipeline with multiplier-based barrel shifter,

hardware multiplication, and instruction cachehardware multiplication, and instruction cache Nios IINios II//f, a large six-stage pipeline with dynamic branch f, a large six-stage pipeline with dynamic branch

prediction, and instruction and data caches.prediction, and instruction and data caches.

Parameterized Soft-coresParameterized Soft-coresConfigurability.Configurability.

Application Specific.Application Specific.

Size, performance and power constraints.Size, performance and power constraints.

Configurable Parameters:Configurable Parameters: Instantiating Functional Units (0,1).Instantiating Functional Units (0,1). Unit-Specific Parameters (Cache type/size).Unit-Specific Parameters (Cache type/size). Instruction Set Architecture.Instruction Set Architecture. Pipelining (Depth).Pipelining (Depth).

Exploration and Customization of FPGA-Exploration and Customization of FPGA-Based Soft ProcessorsBased Soft Processors

Exploration of the micro-architectural tradeoffs for soft Exploration of the micro-architectural tradeoffs for soft processorsprocessors

A set of customization techniques:A set of customization techniques: Tuning the micro-architecture to the application.Tuning the micro-architecture to the application. Subsetting the ISASubsetting the ISA Hybrid approachHybrid approach

To improve the performance/area of a soft processor for To improve the performance/area of a soft processor for a specific application.a specific application.

A CAD Tool.A CAD Tool.

ApproachApproach

Developing a customization tool that will generate the Developing a customization tool that will generate the most customized soft-core.most customized soft-core.

SPREE (soft-processor rapid exploration environment).SPREE (soft-processor rapid exploration environment).

Targeting functional unit customization and ISA Targeting functional unit customization and ISA subsetting.subsetting.

SPREESPREEInput: Textual Description (ISA& Input: Textual Description (ISA& Datapath).Datapath).

ISA & datapath verification.ISA & datapath verification.

Constructing the Datapath.Constructing the Datapath.

Control Generation.Control Generation.

Synthesizable RTL (Verilog)Synthesizable RTL (Verilog)

FrameworkFrameworkAltera Startix I.Altera Startix I.

Comparison with Nios-II variations (e, s and f)Comparison with Nios-II variations (e, s and f)

MIPS Instructtion Set.MIPS Instructtion Set.

Performance MetricsPerformance Metrics Area in LEArea in LE Performance in MIPSPerformance in MIPS Efficiency in MIPS/LEEfficiency in MIPS/LE Equal weight for performance and areaEqual weight for performance and area

BenchmarkBenchmark 20 varied applications (fir, FFT, DES, CRC, QSORT, Bubble-20 varied applications (fir, FFT, DES, CRC, QSORT, Bubble-

sort)sort)

SPREE vs. NiosSPREE vs. Nios

Micro-architecture Exploration (1)Micro-architecture Exploration (1)Functional UnitsFunctional Units Shifter Implementation (serial, shared Shifter Implementation (serial, shared

multiplier)multiplier) Multiplication (SW, HW).Multiplication (SW, HW).

Micro-architecture Exploration (2)Micro-architecture Exploration (2)

PipeliningPipelining DepthDepth OrganizationOrganization

Micro-architecture CustomizationMicro-architecture Customization6 micro-architectural axes6 micro-architectural axes

Exhaustive search for the generated solutions.Exhaustive search for the generated solutions.

ISA SubsettingISA SubsettingEliminate the unused instructionEliminate the unused instruction Simplify Control Unit Simplify Control Unit Reduce Area Reduce Area

Less than 50% utilization of the ISA.Less than 50% utilization of the ISA.

Impact of ISA subsettingImpact of ISA subsetting

Impact on Area

Impact on Performance

ResultsResults

Fine Customization EnvironmentFine Customization Environment

an improvement in performance per area of 14.1% on an improvement in performance per area of 14.1% on average across all benchmarks.average across all benchmarks.

Combined approach improved the Combined approach improved the performance per area performance per area by 24.5% on average across all applications.by 24.5% on average across all applications.

Application-Specific Customization of Application-Specific Customization of Parameterized FPGA Soft-Core Processors Parameterized FPGA Soft-Core Processors

A methodology for fast application-specific customization A methodology for fast application-specific customization of a parameterized FPGA soft core.of a parameterized FPGA soft core.

Targeting 1-2 hours RuntimeTargeting 1-2 hours Runtime

Near-optimal ResultsNear-optimal Results Traditional CAD with 0-1 Knapsack AlgorithmTraditional CAD with 0-1 Knapsack Algorithm Synthesis-in-the-loop exploration.Synthesis-in-the-loop exploration.

FrameworkFramework

Xilinx MB on Virtex-II Pro FPGAXilinx MB on Virtex-II Pro FPGA

Comparison with Base and Full MBComparison with Base and Full MB

Performance MetricsPerformance Metrics Area in equivalent LUTsArea in equivalent LUTs Performance by the application runtime in (ms)Performance by the application runtime in (ms)

BenchmarkBenchmark 11 applications from EEMBC11 applications from EEMBC

JustificationJustification

Approach-1Approach-1Traditional CAD ApproachTraditional CAD Approach

0-1 knapsack problem0-1 knapsack problem Maximize performanceMaximize performance Constraint on areaConstraint on area

6 synthesis/execution runs6 synthesis/execution runs

Approach-2Approach-2

Synthesis-in-the-loopSynthesis-in-the-loop pre-determines the impact each parameter individually has on pre-determines the impact each parameter individually has on

design metricsdesign metrics then search the parameters in sequence, ordered from highest then search the parameters in sequence, ordered from highest

impact to lowest.impact to lowest.

Two orders (fixed-ordered and impact-ordered)Two orders (fixed-ordered and impact-ordered)

ResultsResultsExhaustive search took 11 hours.Exhaustive search took 11 hours.

The fixed impact-ordered tree approach had the fastest The fixed impact-ordered tree approach had the fastest runtime of 108 minutes.runtime of 108 minutes.

Knapsack algorithm with similar results to the fixed Knapsack algorithm with similar results to the fixed impact-ordered tree approach.impact-ordered tree approach.

Similar results for 50% constraint.Similar results for 50% constraint.

No Constraint Fixed 80% constraint Per application 80% constraint

ResultsResults

Reimplementation on Spartan2 FPGAReimplementation on Spartan2 FPGA

1.5 hours runtime for the fixed-order impact-ordered tree1.5 hours runtime for the fixed-order impact-ordered tree

200 minutes for the application-specific impact-ordered tree200 minutes for the application-specific impact-ordered tree

ScalabilityScalability

Increasing the number of parametersIncreasing the number of parameters Increase the runtime.Increase the runtime. Fixed-order impact-ordered tree and knapsack scale Fixed-order impact-ordered tree and knapsack scale

well.well.

ConclusionConclusionImpact of customization on performance and area.Impact of customization on performance and area.

Emphasis on performance.Emphasis on performance.

Customizable parameters span the micro-architecture Customizable parameters span the micro-architecture and the ISA.and the ISA.

Use of near-optimal solutions to save on runtime.Use of near-optimal solutions to save on runtime.

Possibility to look for finer customization, but scalability Possibility to look for finer customization, but scalability have to be addressed.have to be addressed.

Finer customization might consider 0-1 parameters or Finer customization might consider 0-1 parameters or multi-valued parameters.multi-valued parameters.

THANK YOU

Q&A