[IEEE 2014 9th International Symposium on Reconfigurable and Communication-Centric Systems-on-Chip...

5
AbstractApplication-Specific Inflexible FPGA (ASIF) [1] is an FPGA reduced and optimized for a known set of application circuits. An ASIF achieves considerable area gain compared to an FPGA. However, this area gain is achieved at the expense of reduced routing flexibility, thus making the architecture highly irregular. As a consequence, the irregular ASIF architecture cannot be fabricated through the tile based abutment process, besides having other negative implications. This paper explores the design-space between FPGA and ASIF to generate reconfigurable Regular-ASIF architectures with maneuverable regularity and flexibility. Our preliminary experiments have shown some promising results; on average ASIF architecture is 5.6x, whereas Regular-ASIF is 1.02x to 2.1x smaller than a mesh- based unidirectional FPGA architecture. Keywords FPGA; ASIF; domain-specific FPGA; reconfigurable hardware; I. INTRODUCTION AND RELATED WORK Field Programmable Gate Array (FPGA) is a reconfigurable hardware used in diverse applications. It contains programmable logic blocks and interconnects that enable the programmer to reconfigure the same hardware resource for different applications. On the contrary, Application Specific Integrated Circuit (ASIC) is designed and optimized to run only a specific application. The flexibility of an FPGA gives it many advantages over ASIC, but comes with a huge cost. There is a wide gap between traditional FPGA and ASIC in terms of performance (speed), area, dynamic power consumption and unit cost [2]. Unlike FPGAs, ASICs have a higher Non- Recurring Engineering cost (NRE) and larger time-to-market, thus making them suitable only for large volume production. In addition, if there is an error, an ASIC cannot be altered after its fabrication. Structured ASIC surfaced to overcome the NRE cost and time-to-market constraints of ASIC while retaining almost the same performance and power consumption as that of an ASIC. In Structured ASIC, an upper mask programmable layer is used to form interconnects for programmable logic blocks in the lower layer. This, however, has not solved the post-fabrication inflexibility of ASICs, that is, it still supports only a single application like ASIC [3]. Embedded hard blocks are used in FPGA architectures to increase their area and speed efficiency. Traditional FPGA architectures consist of fine-grained logic blocks, which are inefficient for certain data-path operations such as addition, multiplication, arithmetic computation, and DSP operations. The use of coarse-grained logic or hard blocks along with fine- grained logic blocks is found to be more efficient in terms of area and performance [2], thus giving rise to heterogeneous or hybrid FPGAs. Significant work has been done to exploit the efficiency of coarse-grained logic blocks; design of domain- specific area-efficient FPGAs and Coarse-Grained Reconfigurable Architectures (CGRAs) [4] [5]. For example, CGRAs like RaPiD (Reconfigurable Pipelined Datapath) [6] and PipeRench [7] are intended for application specific domains such as multimedia and signal processing. In domain-specific reconfigurable hardware design, Compton et al. has proposed automatic generation of configurable ASIC cores, cASIC [9]. Keeping in mind the requirement of domain-specific System-on-a-Chip (SoCs), where ASICs might be used as accelerators, the authors have proposed configurable ASIC cores (cASIC) to accelerate the compute-intensive and most common applications. The main idea is that a desired software code runs on a processor, while the compute-intensive portions of the code runs on an accompanying cASIC to achieve high speed and efficiency. The target cASIC is customized for a set of application circuits known beforehand. Parvez et al. have proposed a similar solution namely Application Specific Inflexible FPGA (ASIF) [1]. Like cASIC, ASIF architecture is also optimized for a known set of application circuits which will execute at mutually exclusive times. ASIF uses a top-down approach by first defining a suitable FPGA architecture and then reducing it for the given set of application circuits. The reduction of a highly regular FPGA architecture however generates a highly irregular ASIF architecture, thus making it extremely difficult and inefficient to use a tile-based abutment process for layout. Moreover, due to reduced flexibility, it becomes extremely difficult to map a new application circuits on an ASIF. This work intends to reduce these drawbacks by exploring the design-space between ASIF and FPGA. We propose to add regularity to ASIF to overcome the drawbacks stated above. The proposed architecture is termed as Regular ASIF or RASIF, since it is an ASIF with quite regular architecture in terms of tiles. This paper is further divided into four major sections. Section II gives a brief introduction to different ASIF generation techniques. Section III describes our Regular ASIF generation techniques. Section IV presents experimentation and analysis. Conclusion and Future work is presented in section V. II. INTRODUCTION TO ASIF ASIF is a reconfigurable hardware that is reduced from an FPGA for a known set of netlists (application circuits) which will execute at mutually exclusive times. It is similar to cASIC Design-Space Exploration between FPGA and ASIF Muhammad Amin Qureshi, Husain Parvez Karachi Institute of Economics and Technology Korangi Creek, 75190, Karachi, Pakistan {ameen.qureshi, husain.parvez}@pafkiet.edu.pk 978-1-4799-5810-8/14/$31.00 ©2014 IEEE

Transcript of [IEEE 2014 9th International Symposium on Reconfigurable and Communication-Centric Systems-on-Chip...

Page 1: [IEEE 2014 9th International Symposium on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC) - Montpellier, France (2014.5.26-2014.5.28)] 2014 9th International Symposium

Abstract— Application-Specific Inflexible FPGA (ASIF) [1] is an FPGA reduced and optimized for a known set of application circuits. An ASIF achieves considerable area gain compared to an FPGA. However, this area gain is achieved at the expense of reduced routing flexibility, thus making the architecture highly irregular. As a consequence, the irregular ASIF architecture cannot be fabricated through the tile based abutment process, besides having other negative implications. This paper explores the design-space between FPGA and ASIF to generate reconfigurable Regular-ASIF architectures with maneuverable regularity and flexibility. Our preliminary experiments have shown some promising results; on average ASIF architecture is 5.6x, whereas Regular-ASIF is 1.02x to 2.1x smaller than a mesh-based unidirectional FPGA architecture.

Keywords — FPGA; ASIF; domain-specific FPGA; reconfigurable hardware;

I. INTRODUCTION AND RELATED WORK Field Programmable Gate Array (FPGA) is a reconfigurable

hardware used in diverse applications. It contains programmable logic blocks and interconnects that enable the programmer to reconfigure the same hardware resource for different applications. On the contrary, Application Specific Integrated Circuit (ASIC) is designed and optimized to run only a specific application. The flexibility of an FPGA gives it many advantages over ASIC, but comes with a huge cost. There is a wide gap between traditional FPGA and ASIC in terms of performance (speed), area, dynamic power consumption and unit cost [2]. Unlike FPGAs, ASICs have a higher Non-Recurring Engineering cost (NRE) and larger time-to-market, thus making them suitable only for large volume production. In addition, if there is an error, an ASIC cannot be altered after its fabrication. Structured ASIC surfaced to overcome the NRE cost and time-to-market constraints of ASIC while retaining almost the same performance and power consumption as that of an ASIC. In Structured ASIC, an upper mask programmable layer is used to form interconnects for programmable logic blocks in the lower layer. This, however, has not solved the post-fabrication inflexibility of ASICs, that is, it still supports only a single application like ASIC [3].

Embedded hard blocks are used in FPGA architectures to increase their area and speed efficiency. Traditional FPGA architectures consist of fine-grained logic blocks, which are inefficient for certain data-path operations such as addition, multiplication, arithmetic computation, and DSP operations. The use of coarse-grained logic or hard blocks along with fine-

grained logic blocks is found to be more efficient in terms of area and performance [2], thus giving rise to heterogeneous or hybrid FPGAs. Significant work has been done to exploit the efficiency of coarse-grained logic blocks; design of domain-specific area-efficient FPGAs and Coarse-Grained Reconfigurable Architectures (CGRAs) [4] [5]. For example, CGRAs like RaPiD (Reconfigurable Pipelined Datapath) [6] and PipeRench [7] are intended for application specific domains such as multimedia and signal processing.

In domain-specific reconfigurable hardware design, Compton et al. has proposed automatic generation of configurable ASIC cores, cASIC [9]. Keeping in mind the requirement of domain-specific System-on-a-Chip (SoCs), where ASICs might be used as accelerators, the authors have proposed configurable ASIC cores (cASIC) to accelerate the compute-intensive and most common applications. The main idea is that a desired software code runs on a processor, while the compute-intensive portions of the code runs on an accompanying cASIC to achieve high speed and efficiency. The target cASIC is customized for a set of application circuits known beforehand.

Parvez et al. have proposed a similar solution namely Application Specific Inflexible FPGA (ASIF) [1]. Like cASIC, ASIF architecture is also optimized for a known set of application circuits which will execute at mutually exclusive times. ASIF uses a top-down approach by first defining a suitable FPGA architecture and then reducing it for the given set of application circuits. The reduction of a highly regular FPGA architecture however generates a highly irregular ASIF architecture, thus making it extremely difficult and inefficient to use a tile-based abutment process for layout. Moreover, due to reduced flexibility, it becomes extremely difficult to map a new application circuits on an ASIF. This work intends to reduce these drawbacks by exploring the design-space between ASIF and FPGA. We propose to add regularity to ASIF to overcome the drawbacks stated above. The proposed architecture is termed as Regular ASIF or RASIF, since it is an ASIF with quite regular architecture in terms of tiles.

This paper is further divided into four major sections. Section II gives a brief introduction to different ASIF generation techniques. Section III describes our Regular ASIF generation techniques. Section IV presents experimentation and analysis. Conclusion and Future work is presented in section V.

II. INTRODUCTION TO ASIF ASIF is a reconfigurable hardware that is reduced from an

FPGA for a known set of netlists (application circuits) which will execute at mutually exclusive times. It is similar to cASIC

Design-Space Exploration between FPGA and ASIF Muhammad Amin Qureshi, Husain Parvez

Karachi Institute of Economics and Technology Korangi Creek, 75190, Karachi, Pakistan

{ameen.qureshi, husain.parvez}@pafkiet.edu.pk

978-1-4799-5810-8/14/$31.00 ©2014 IEEE

Page 2: [IEEE 2014 9th International Symposium on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC) - Montpellier, France (2014.5.26-2014.5.28)] 2014 9th International Symposium

as it is also optimized only for a known set of netlists. However, it differs from cASIC in generation technique and the usage. cASIC uses bottom-up construction approach while ASIF uses top-down reduction approach. cASIC is proposed for acceleration purpose, while ASIF is proposed for stand-alone application circuits [1].

To generate an ASIF, a group of netlists is selected and a suitable FPGA architecture is defined that can accommodate all the selected netlists. For our experiments, we have used a mesh-based homogeneous FPGA architecture having uni-directional, single driver, length-1 routing wires, as shown in Figure 1. This figure describes what is called a tile in FPGA architecture. The connection boxes connect the input and output pins of the CLB (Configurable Logic Block) with the routing channel surrounding it. If the connection box parameters (Fc(in) and Fc(out)) are set to be 1.0, the connectivity is maximum. These connection boxes are connected with the rest of the architecture through the switch boxes.

In the algorithm used, an application circuit to be mapped on the FPGA architecture is first transformed in the form of CLBs using data structures. The simulated annealing algorithm [8] places each netlist on the FPGA architecture in such a way that logic blocks of the FPGA are shared amongst the instances of different netlists, however instances belonging to the same netlist cannot share a single logic block. The PathFinder routing algorithm [8] routes each netlist individually on the FPGA routing resources. When all the netlists are placed and routed, the unused routing resources of the FPGA are removed to generate an ASIF. Variations in placement and routing algorithms are performed to propose four different ASIF generation techniques. In the ASIF-1 technique, no routing wire is shared amongst different netlists. Thus the channel width required by ASIF-1 is the sum of channel widths required by each netlist in the group of netlists for which an ASIF is required. Thus the ASIF generated through this technique retains only the switches in the connection boxes, whereas no switch in the switch box is retained. In ASIF-2 technique, routing wires are shared amongst different netlists. Thus the channel width required by ASIF-2 is the maximum channel width required by any netlist in the group. The number of wires used in ASIF-2 is much lower than used by ASIF-1, but number of switches is quite higher. In ASIF-3 technique, routing wires are efficiently shared amongst different netlists. The cost function of the congestion-driven pathfinder routing algorithm is modified to efficiently share the wires and switches used by different netlists. In ASIF-4 technique, efficient placement technique is also employed along with the efficient wire sharing technique. The cost function of the simulated annealing placement algorithm is modified to efficiently place the instances of different netlists. The efficient placement technique optimizes the inter-netlist placement as well as intra-netlist placement for given netlists. Experiments reveal that ASIF-4 technique gives the optimal results [1]. This work employs the ASIF-4

generation technique.

III. REGULAR ASIF (RASIF) One of the main advantages of an FPGA is that it has a

highly regular architecture, thus making its layout generation much easier. A homogeneous FPGA requires only a few tiles which can be abutted to construct an FPGA of any size. On contrary, when the unused routing resources are removed from the FPGA (to generate an ASIF), the regularity of tiles is severely disturbed. The variations in the configuration of connection boxes and switch boxes make ASIF tiles different

Fig. 2 – RASIF Technique-1: Tiles repeating in whole

columns

Fig. 1 – An FPGA tile has been shown with adjacent switch

boxes and connection boxes.

Page 3: [IEEE 2014 9th International Symposium on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC) - Montpellier, France (2014.5.26-2014.5.28)] 2014 9th International Symposium

from each other. Thus the ASIFs generated through the above mentioned four techniques are irregular to the extent that the total unique tiles in an ASIF are approximately equal to the total number of the tiles required in the architecture. This work aims to introduce regularity and flexibility in the ASIF architecture. The proposed architecture is termed as a regular ASIF, or simply RASIF. Since ASIF is obtained by reducing the flexibility of an FPGA; the introduction of flexibility back to ASIF tiles will obviously increase its area. A right compromise between area and flexibility needs to be found. The RASIF architecture is regular enough to utilize the tile-based abutment process for its layout generation as well as it is flexible enough to map new application circuits. The regularity introduced in the ASIF places it much closer to FPGA architecture. A homogeneous FPGA requires only a few tiles which can be abutted to construct FPGA of any size. Similarly, to construct an ASIF architecture which is regular, the number of unique tiles in the ASIF architecture need to be reduced. A regularity parameter is introduced so that the design space between ASIF and FPGA can be explored. This regularity parameter is selected to be the number of unique tiles in the architecture. Thus we can have a list of similar tiles that can be repeated in a restricted manner to generate an RASIF layout.

This work presents the following two RASIF generation techniques. Both techniques are employed after an FPGA is reduced to an ASIF.

RASIF-1 In this approach, switch box (SB) and connection box (CB)

configuration of a single ASIF tile is replicated in all the tiles found in the same column. This approach is repeated for all the tiles in all the columns. In this way, all the tiles in a single column of RASIF become similar. However, the configuration of SBs and CBs in each RASIF tile becomes the union of configurations of SBs and CBs respectively in all the ASIF tiles found in a single column. Every column contains similar tiles; however tiles in different columns may vary. The number of unique tiles in this case becomes equal to the number of columns in the architecture. The pictorial depiction of RASIF-1 generation technique is shown in Fig-2.

As expected, this approach does not yield any good results. The RASIF area increases and becomes almost equal to that of an FPGA.

RASIF-2 The regularity and area of an RASIF is controlled by

deciding how many tiles must be made similar in a single column. We introduce another parameter - number of tiles in a block or block size. The switch box and connection box configuration of each tile is repeated in all the tiles residing in the corresponding block only. All the tiles in a single RASIF block will be similar. However, all the blocks residing in a single RASIF column will have different tiles having different areas. In this approach, some area may go waste as the width of a single column in computed from the maximum width of any tile in the column. The pictorial depiction of RASIF-2 generation technique is shown in Fig-3. For example, if we have an ASIF of size 36x36 and we want to generate a RASIF

of block size 5, then we will have 7 complete blocks in a single column; each block containing 5 similar tiles; an additional 8th block will be an incomplete block with only one tile. Thus, we get 8 unique tiles in each column. The number of columns is 36, so we have 288 unique tiles in a RASIF of size 36x36 having block size 5.

Obviously, the larger the block size, higher will be the area and flexibility, and lesser will be the number of unique tiles. This parameter is carefully maneuvered according to situation and requirement. We, however, do not want RASIF to be so regular that it would eventually reflect an FPGA.

IV. EXPERIMENTATION AND ANALYSIS We have categorized 20 MCNC benchmark circuits

Fig. 3 – RASIF Technique-2: Tiles repeating in a block

Table 1: Groups of 20 MCNC netlists for experimentation

Group Number Netlists Largest Channel

Width in Group Largest Size of

Netlist in Group

1. tseng, ex5p, apex4 14 36x36

2. dsip, misex, diffeq 12 33x33

3. alu4, bigkey, des 10 42x42

4. apex2, s298, seq 14 44x44

5. ex1010, pdc 18 68x68

6. spla, frisc, elliptic 14 61x61

7. clma, s38417, s38584.1 16 92x92

Page 4: [IEEE 2014 9th International Symposium on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC) - Montpellier, France (2014.5.26-2014.5.28)] 2014 9th International Symposium

according to their sizes and divided them into six groups or sets; each group containing 2~3 netlists having roughly the same sizes. The list of netlists categorized into six different groups is shown in Table 1. For each group, we selected the maximum size of an FPGA required by any netlist in the group. Similarly we selected the channel width of the FPGA by selecting the largest of the minimum channel width of a particular netlist in a group.

For each group of netlists, an FPGA, an ASIF, and different version of RASIFs are generated and compared. Figure-4 compares the area parameter of FPGA, RASIF-1 and RASIF-2 and ASIF for seven groups of netlists. It can be noticed in the figure that the area of RASIF-1 is almost equal to that of an FPGA. This is because the switch box and connections box configurations of all the tiles in a single column are made similar, thus increasing the area drastically. The RASIF-2 technique with block size equal to the total number of tiles in a column (mentioned as Full RASIF-2) is also similar to RASIF-1 technique thus giving the same area results. An ASIF is generated using the ASIF-4 generation technique mentioned in section II. The area of ASIF can be further decreased if the channel width is increased. However, we preferred to use a common channel width parameter for all the proposed architectures. RASIF-2 for block size of 2, 4, 6 and 10 tiles is also compared. As the block size increases, the area of RASIF-2 also increases. However the number of unique tiles roughly decreases by the factor of total tiles in a block. For example, for a block size of 2, the number of unique tiles in RASIF would be half of what it would be in ASIF.

The average area gain of an RASIF-2 ranges from 1.02x (for RASIF-1) to 2.1x (for RASIF-2 with block size 2) as

compared with the FPGA. The maximum area gain of an RASIF-2 for a particular group (Group-2) is found to be 2.8x as compared with the FPGA.

V. CONCLUSION AND FUTURE WORK In this paper, we have proposed an application-specific

regular FPGA architecture named as Regular-ASIF. The area results of different groups of netlists were presented and on average, shown to be 1.02x to 2.1x smaller than the area of an FPGA. However, for a certain group, this gain is as high as 2.8x. The timing analysis needs to be done and is left for future work. To achieve greater area gains, variations in placement and routing algorithms need to be performed so that more regular ASIF architectures are generated at first place.

The FPGA architecture used in this work is not fully optimized in terms of connection box configuration (Fc(in) and Fc(out)) and the use of long wires. We intend to use these parameters to get a more realistic comparison of RASIF and FPGA. Moreover, this work concentrates on using a homogeneous FPGA architecture. In future we also intend to use heterogeneous FPGA and RASIF architectures.

ACKNOWLEDGMENT The authors are extremely grateful to the National Information and Communication Technology Research and Development Fund (ICT R&D) of Ministry of Information Technology, Pakistan for finding this work.

Fig. 4 – Area comparison of FPGA, ASIF and RASIF of different groups of netlists.

Page 5: [IEEE 2014 9th International Symposium on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC) - Montpellier, France (2014.5.26-2014.5.28)] 2014 9th International Symposium

REFERENCES [1] H.Parvez, Z. Marrakchi, H.Mehrez, “ASIF: Application Specific

Inflexible FPGA,”IEEE Field-Programmable Technology, pp 112-119, 2009.

[2] I. Kuon, J. Rose, “Measuring gap between FPGAs and ASICs”, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD) , Vol. 26, No. 2, pp 203-215, 2007.

[3] K. Wu and Y. Tsai, “Structured ASIC, Evolution or Revolution?” Proceedings of International Symposium on Physical Design, pp. 103–106, 2004.

[4] Ye, A.; Rose, J.; David, L., "Architecture of datapath-oriented coarse-grain logic and routing for FPGAs," Proceedings of the IEEE Custom Integrated Circuits Conference, 2003, vol., no., pp.61- 64, 2003.

[5] C.H. Ho, C.W. Yu, P.H.W. Leong, W. Luk and S.J.E. Wilton, “Domain-specific hybrid FPGA: architecture and floating-point applications”, Proc. International Conference on Field-Programmable Logic and Applications, pp. 196-201, 2007.

[6] D. Cronquist, et al., “Architecture design of reconfigurable pipelined datapaths,” Proc. of 20th Anniversary Conference on Advanced Research in VLSI, 1999, Atlanta, pp. 23–40, 1999.

[7] S. C. Goldstein,H. Schmit, M. Budiu, S. Cadambi, M. Matt, and R. R. Taylor , “PipeRench: A Reconfigurable Architecture and Compiler,” IEEE Computer, vol. 33, no. 4, pp. 70–77, 2000.

[8] V. Betz, A. Marquardt, and J. Rose, “Architecture and CAD for Deep-Submicron FPGAs”, January 1999.

[9] K. Compton and S. Hauck, “Automatic Design of Area-Efficient Configurable ASIC Cores”, IEEE Transaction on Computers, vol. 56, no. 5, pp 662-672, May 2007.