High-Level Synthesis for FPGAs: From Prototyping to Deployment (To appear in IEEE TCAD 2011) Course...

28
High-Level Synthesis for FPGAs: From Prototyping to Deployment (To appear in IEEE TCAD 2011) Course Presentation By: Murtaza Merchant 1
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    213
  • download

    0

Transcript of High-Level Synthesis for FPGAs: From Prototyping to Deployment (To appear in IEEE TCAD 2011) Course...

High-Level Synthesis for FPGAs:

From Prototyping to Deployment

(To appear in IEEE TCAD 2011)

Course Presentation

By:Murtaza Merchant

1

Overview

• Introduction

• Motivation

• Reasons for failure

• Historical Background

• Design strategies for HLS tool

• Autopilot HLS design flow for FPGAs

• Platform modeling for FPGAs

• Advances in Synthesis and optimization Algorithms

• Integration with Domain-Specific design platforms

• Results

• Conclusion2

Introduction

• Automatic synthesis of high-level description to low-level cycle-accurate (RTL) specifications

• High level specification• Untimed• Partially timed

• Targeted for FPGAs

• State-of-art C-to- FPGA synthesis solutions targeting multiple application domains

3

Why do we need HLS tools?

• Embedded processors•Hardware/Software co-design

•SoC design complexity

• Behavioral IP reuse

• System-level Verification•Transaction-level modeling (TLM)

• Time-to-market

• Ease of use•Code density reduces by 7x-10x

4

Reasons for Failure

• Lack of comprehensive design language support• Behavioral HDL was used

• C and C++ lack the constructs and semantics to represent • Design hierarchy• Timing• Synchronization• Concurrency

• Lack of reusable and portable design specification• Functional specification highly tool dependent

• Lack of satisfactory quality of results (QoR) 5

Historical Background

• Academic effort

• HAL developed at Bell-Northern Research [DAC ‘86]• ADAM system developed at USC [DAC ‘89]• Hercules/Hebe HLS system at Stanford [Euro ASIC ‘90]• Hyper/Hyper-LP system at UC Berkeley [ICCAD ‘92]

• Industry efforts

• Catapult C from Mentor Graphics [2004]• C-to-Silicon Compiler from Cadence [2008]• Synphony C Compiler from Synopsys [2009]• AutoPilot from AutoESL (UCLA xPilot project) [DAC 2009]

6

Design strategies for HLS tool

• Restrict the use of dynamic constructs• Pointers• Recursion• Polymorphism

• Use of hardware-oriented language extensions• HardwareC , SpecC , Handel-C • Libraries (SystemC)

• Efficient Parallel Architectures

• Allow an optimization-oriented design process• Modification• Refactoring

7

Autopilot HLS design flow for FPGAs

• State-of-art commercial HLS tool

• Inputs: High-level language• ANSI C, C++, • SystemC

• Outputs: RTL •Verilog, VHDL

•Cycle-accurate SystemC

•Automatic co-simulation

8

Platform modeling for FPGAs

• Target specific synthesis and optimization• Mapping group of operations to platform-specific blocks • Prefabricated architecture blocks in FPGA

• DSP48 blocks• BRAMs

• Component pre-characterization• Modeling process• Characterize delay/area/power for each hardware resource• Select best implementation choice using characterization data

9

Advances in Synthesis and optimization Algorithms

• Efficient mathematical programming formulations to scheduling

• Soft constraints and applications for platform-based Optimization

• Pattern mining for efficient sharing

• Memory analysis and optimizations

10

Mathematical programming formulations to scheduling

• Heuristic: List scheduling• Leads to sub-optimal solutions

• Exact formulations : Integer linear programming • Difficult to scale to large designs• O(m×n) binary variables to encode a scheduling solution

with n operations and m steps

• System of difference constraint (SDC) • Linear- programming formulation• Efficient and scalable• O(n) variables used to encode a scheduling solution with n

operations

11

SDC based linear-programming formulation

• Scheduling variable si = [0, Lv] for each operation i• Represent the time step at which the operation is scheduled

• Constraint represented in integer-difference form: si – sj dij

• Generated constraint matrix is totally Unimodular• Every square submatrix has a determinant of 0 or ±1• Unimodular matrices guaranteed to have optimal integral

solutions • No expensive branch-and-bound procedures

12

J. Cong and Z. Zhang, “An efficient and versatile scheduling algorithm based on SDC formulation,” in Proc. DAC'06, pp. 433-438.

Representing constraints for SDC

• Expressed in integer difference form• Data dependencies• Control dependencies• Relative timing in I/O protocols• Latency upper-bounds

• What about resource constraints?• Use heuristics• Generate pair-wise orderings

13

J. Cong and Z. Zhang, “An efficient and versatile scheduling algorithm based on SDC formulation,” in Proc. DAC'06, pp. 433-438.

Soft constraints for multiple design intentions

• Design intentions are expressed as constraints

• Strict (hard) constraints limit the ability to handle multiple conflicting design intentions

• Eliminates improving some aspects of the design with some other reasonable estimated violations

• Alternative: Use soft constraints allowing some constraints to be violated

14

Soft constraints for multiple design intentions

• Consider scheduling problem with hard and soft constraints

G , H: hard and soft constraint matrices Hj: jth row of H vj : violation variable

Øj(vj) :penalty term to objective function

15

Pattern mining for efficient sharing

• Sharing of functional units, storage units or interconnects by multiple operations in a time-multiplexed manner

• Need for multiplexers

• Large multiplexers expensive on FPGA platforms• More overhead, than benefit due to sharing

• Extract common patterns in the data-flow graph • Different instances of the same pattern can share resources

• Graph editing distance used as a metric to measure the

similarity two patterns

16

J. Cong and W. Jiang, “Pattern-based behavior synthesis for FPGA resource reduction,” in Proc. FPGA'08, Feb. 2008, pp. 107-116.

Pattern mining example

Figures (a) and (b):

Original DFG and resource binding

Figures (c) and (d):

DFG and resource binding

post pattern mining17J. Cong and W. Jiang, “Pattern-based behavior synthesis for FPGA resource reduction,” in

Proc. FPGA'08, Feb. 2008, pp. 107-116.

Memory analysis and optimizations

• FPGAs performance limited by memory bandwidth• Memory partition critical to meet performance target

• Automatic partitioning of array elements across multiple physical memory blocks necessary for pipelined loops

• to increase throughput • reduce power

• Capture all possible reference conflicts under partitioning in a conflict graph.

• Iterative algorithm used to perform scheduling and memory partitioning using the conflict graph.

18J. Cong, W. Jiang, B. Liu, and Y. Zou, “Automatic memory partitioning and scheduling for

throughput and power optimization,” in Proc. ICCAD '09, Nov. 2009, pp. 697-704.

Memory partitioning and scheduling for throughput optimization.

19J. Cong, W. Jiang, B. Liu, and Y. Zou, “Automatic memory partitioning and scheduling for throughput and power optimization,” in Proc. ICCAD '09, Nov. 2009, pp. 697-704

Integration with Domain-Specific design platforms

• Interface cores• Tight architecture requirements• Implemented in RTL• about 5%

• Processor subsystem

• HLS generated design

20

AutoPilot HLS Results (Sphere decoder)

•Design targeted to Xilinx Virtex 5 FPGA

•Application exhibits a large amount of parallelism• resource sharing• time-division multiplexing

•AutoPilot has better resource utilization and lesser development time

21

Conclusion

•The latest generation of FPGA HLS tools have made significant progress in:

•Providing wide language coverage•Robust compilation technology•Platform-based modeling•Synthesis and optimization techniques•Domain-specific system-level integration

•Provides highly competitive quality of results•Comparable or better than manual RTL designs

•Transition from research and investigation to selected deployment

22

Challenges• Support of memory hierarchy

• Many applications need access to external memory• Lacks support of memory hierarchy• Designers exposed to details of bus interfaces and memory

controllers

•Higher-level models• Extracts instruction-level and loop level parallelism• Difficult to extract task-level parallelism from C/C++

•In-System Design validation and debugging• RTL-level timing accurate simulation used• Debugging at RTL level is complicated• Need for most verification and debugging in C domain

23

Questions?

Advances in Simulation and verification

• Develop, debug and functionally verify a design at an higher level

• Reduces verification effort due •Easier to trace, identify and fix bugs at higher abstraction level• Simulation at the higher level is

orders of magnitude faster than RTL simulation

• More comprehensive tests and greater coverage.

25

Automatic co-simulation

• Direct reuse of the original test framework in C/C++ to verify the correctness of the synthesized RTL

• C-to-RTL transactor connect high-level interfacing constructs (parameters and global variables) with pin-level signals in RTL

• Helps designers avoid the timing-consuming manual creation of an RTL test bench

26

Equivalence Checking

• High-level models to RTL checking

• Require states in the designs to have one-to-one correspondence between flip-flops and latches in the two designs

• Significant state differences exist between the High level model and RTL model

• Necessary to use Sequential Logic Equivalence Checking (SEC)

27

Sequential Logic Equivalence Checking (SEC)

•Model extraction from SLMExtracting the hardware model from SystemC or C/C++

•Sequential analysisEfficient unrolling of finite-state machines (FSMs) to align the SLM and RTL state machines to synchronizing states

•Bit-level/Word-level solversBDD, SAT to address the system level to RTL formal verification

•Mechanism for specifying temporal mappings at I/Os and state pointsSEC tool to automatically infer these mappingsCommercial tool from Calypto design

28

A. Mathur, M. Fujita, E. Clarke, and P. Urard, “Functional equivalence verification tools in high-level synthesis flows,” IEEE Design & Test of Computers, vol. 26(4), pp. 88-95, Dec. 2009.