AutomaticGenerationofOptimizedandSynthesizableHardware ... · PDF file2 VLSI Design...

download AutomaticGenerationofOptimizedandSynthesizableHardware ... · PDF file2 VLSI Design structuresofthelanguagesothesestructureshavetobeman- ... the middle-end and in the back-end. Finally,

If you can't read please download the document

Transcript of AutomaticGenerationofOptimizedandSynthesizableHardware ... · PDF file2 VLSI Design...

  • Hindawi Publishing CorporationVLSI DesignVolume 2012, Article ID 298396, 14 pagesdoi:10.1155/2012/298396

    Research Article

    Automatic Generation of Optimized and Synthesizable HardwareImplementation from High-Level Dataflow Programs

    Khaled Jerbi,1, 2 Mickael Raulet,1 Olivier Deforges,1 and Mohamed Abid2

    1 IETR/INSA. UMR CNRS 6164, 35043 Rennes, France2 CES Laboratory, National Engineering School of Sfax, 3038 Sfax, Tunisia

    Correspondence should be addressed to Khaled Jerbi, [email protected]

    Received 16 December 2011; Revised 18 April 2012; Accepted 15 May 2012

    Academic Editor: Maurizio Martina

    Copyright 2012 Khaled Jerbi et al. This is an open access article distributed under the Creative Commons Attribution License,which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

    In this paper, we introduce the Reconfigurable Video Coding (RVC) standard based on the idea that video processing algorithmscan be defined as a library of components that can be updated and standardized separately. MPEG RVC framework aims atproviding a unified high-level specification of current MPEG coding technologies using a dataflow language called Cal ActorLanguage (CAL). CAL is associated with a set of tools to design dataflow applications and to generate hardware and softwareimplementations. Before this work, the existing CAL hardware compilers did not support high-level features of the CAL. Afterpresenting the main notions of the RVC standard, this paper introduces an automatic transformation process that analyses thenon-compliant features and makes the required changes in the intermediate representation of the compiler while keeping the samebehavior. Finally, the implementation results of the transformation on video and still image decoders are summarized. We showthat the obtained results can largely satisfy the real time constraints for an embedded design on FPGA as we obtain a throughputof 73 FPS for MPEG 4 decoder and 34 FPS for coding and decoding process of the LAR coder using a video of CIF image size. Thiswork resolves the main limitation of hardware generation from CAL designs.

    1. Introduction

    User requirements of high quality video are growing whichcauses a noteworthy increase in the complexity of the algo-rithms of video codecs. These algorithms have to be imple-mented on a target architecture that can be hardware or soft-ware. In 2007, the notion of Electronic System Level Design(ESLD) has been introduced in [1] as a solution to decreasethe time to market using high-level synthesis which is anautomatic compilation of high-level description into a low-level one called register transfer level (RTL). The high-leveldescription is governed by models of computation which arethe rules defining the way data is transferred and processed.Many solutions were developed to automate the hardwaregeneration of complex algorithms using ESLD. Synopsysdeveloped a C to gate compiler called synphony [2]. MentorGraphics also created a C to HDL compiler called Cata-pult C [3, 4]. For their NIOS II, Altera introduces C2H as aconverter from C to HDL [5, 6]. To extend Matlab for hard-ware generation from functional blocks, Mathworks created

    a hardware generator for FPGA design [7]. In the universityresearch field, STICC laboratory in France developed a high-level synthesis tool called GAUT that extracts parallelismand generates VHDL code from a pure C description [8, 9].The common point between all previously quoted tools isthe fact that they are application-specific generators whichmeans that they are not always efficient on an entire multi-component system description.

    In this context, CAL [10] was introduced in the PtolemyII project [11] as a general-use dataflow target agnostic lang-uage based on the dataflow Process Network (DPN) Modelof Computation [12] related to the Kahn Process Network(KPN) [13]. The MPEG community standardized the RVC-CAL language in the MPEG RVC (Reconfigurable VideoCoding) standard [14]. This standard provides a frameworkto describe the different functions of a codec as a network offunctional blocks developed in RVC-CAL and called actors.Some hardware compilers of RVC-CAL were developed buttheir limitation is the fact that they cannot compile high-level

  • 2 VLSI Design

    structures of the language so these structures have to be man-ually transformed.

    In [15], we presented an original functional method toquicken the HDL generation using a software platform forrapid design and validation of a high complexity dataflowarchitecture but going from high to low-level representationused to be manual. Therefore, we proposed to add automatictransformations to make any RVC-CAL design synthesizable.

    This paper extends a preliminary work presented in [16]by introducing efficient optimizations and their impact onthe area and time consumption of the design. The transform-ation tool analyzes the RVC-CAL code and performs therequired transformations to obtain synthesizable code what-ever the complexity of the considered actor. In Section 2, weexplain the main advantages of using MPEG RVC standardfor signal processing algorithms and the key notions of theRVC-CAL language and its behavioral structures and mech-anisms. The proposed transformation process is detailedin Section 4 and finally hardware implementation resultsof MPEG4 Part2 decoder and LAR codec are presented inSections 5 and 6.

    2. Background

    Since the beginning of ISO/IEC/WG11 (MPEG) in 1988with the appearance of MPEG-1, many video codecs havebeen developed (MPEG-4 part2, MPEG SVC, MPEG AVC,HEVC, etc.) with an increasing complexity and so they takelonger time to be produced. In addition, every standard hasa set of profiles depending on the implementation targetor the user specifications. Consequently, it became a toughtask for standard communities to develop, test, and stand-ardize a decoder at any given time. Moreover, the standardsspecification is monolithic which makes it harder to reuse orupdate some existing algorithms. This ascertainment origi-nated a new conception methodology standard called Recon-figurable Video Coding introduced by MPEG.

    In the following, we present an overview of MPEG RVCstandard and associated tools and frameworks, we also pre-sent the main features of CAL actor language and the limita-tions that motivated this work.

    2.1. MPEG RVC. RVC presents a modular library of elemen-tary components (actors). The most important and attractivefeatures of RVC are reconfigurability and flexibility. An RVCdesign is a dataflow directed graph with actors as verticesand unidirectional FIFO channels as edges. An example ofa graph is shown in Figure 1.

    Actually, defining video processing algorithms using ele-mentary components is very easy and rapid with RVC sinceevery actor is completely independent from the rest of theother actors of the network. Every actor has its own schedu-ler, variables, and behavior. The only communication of anactor are its input ports connected to the FIFO channels tocheck the presence of tokens and as explained later an inter-nal scheduler is going to allow or not the execution of ele-mentary functions called actions depending on their cor-responding firing rules (see Section 3). Thus, RVC insuresconcurrency, modularity, reuse, scalable parallelism, and

    encapsulation. In [17], Janneck et al. show that, for hardwaredesigns, RVC standard allows a gain of 75% of developmenttime and considerably reduces the number of lines comparedwith the manual HDL code. To manage all the presentedconcepts of the standard, RVC presents a framework basedon the use of the following.

    (i) A subset of the CAL actor language called RVC-CALthat describes the behavior of the actors (see detailsin Section 2.2).

    (ii) A language describing the network called FNL (Func-tional unit Network Language) that lists the actors,the connections and the parameters of the network.FNL is an XML dialect that allows a multilevel des-cription of actors hierarchy which means that a func-tional unit can be a composition of other functionalunits connected in another network.

    (iii) Bitstream syntax Description Language (BSDL) [18,19] to describe the structure of the bitstream.

    (iv) An important Video Tool Library (VTL) of actorscontaining all MPEG standards. This VTL is underdevelopment and it already contains 3 profiles ofMPEG 4 decoders (Simple Profile, Progressive HighProfile and Constrained Baseline Profile).

    (v) Tools for edition, simulation, validation and auto-matic generation of implementations:

    (a) open DF framework [20] is an interpreterinfrastructure that allows the simulation ofhierarchical actors network. Xilinx contributedto the project by developing a hardware com-piler called OpenForge (available at http://openforge.sourceforge.net/) [21] to generateHDL implementations from RVC-CAL designs.

    (b) open RVC-CAL Compiler (Orcc) (available athttp://orcc.sourceforge.net/) [19] is an RVC-CAL compiler under development. It compilesa network of actors and generates code forboth hardware and software targets. Orcc isbased on works on actors and actions analysisand synthesis [22, 23]. In the front-end ofOrcc, a graph network and its associated CALactors are parsed into an abstract syntax tree(AST) and then transformed into an inter-mediate representation that undergoes typing,semantic checks and several transformations inthe middle-end and in the back-end. Finally,pretty printing is applied on the resulting IR togenerate a chosen implementation language (C,Java, Xlim, LLVM, etc.).

    At this level, the question is that why RVC-CAL and notC? Actually, a C description involves not only the specifica-tion of the algorithms but also the way inherently parallelcomputations are sequenced, the way data is exchangedthrow in