U.maheswaran Presentation
-
Upload
maheswaran-umaiyorupagan -
Category
Documents
-
view
15 -
download
3
Transcript of U.maheswaran Presentation
![Page 1: U.maheswaran Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061204/547faa805906b50d298b45fb/html5/thumbnails/1.jpg)
CGRA WITH FLOATING POINT SUPPORT FOR SIGNED ALU
OPERATIONS MAPPED CGRA FOR FLOATING POINT OPERATIONS
U.Maheswaran.,M.E.[A.E]
PG Scholar,M.N.M.Jain Engineering college,
Chennai,Tamilnadu,India
P.VenugopalM.E.[A.E],M.B.A.
Asst.Professor,Dept.of ECE,
M.N.M.Jain Engineering college,
Chennai,Tamilnadu,India
Submitted by
![Page 2: U.maheswaran Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061204/547faa805906b50d298b45fb/html5/thumbnails/2.jpg)
CONTENTS:
Introduction Problem Definition Technical Background Proposed Idea Design Methodology Design Aspects Results Conclusions Queries
![Page 3: U.maheswaran Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061204/547faa805906b50d298b45fb/html5/thumbnails/3.jpg)
INTRODUCTION
Why reconfigurable computing?
What are CGRA s?
Why not FPGAs for Complex applications?
Reconfigurable computing & Mapping applications?
![Page 4: U.maheswaran Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061204/547faa805906b50d298b45fb/html5/thumbnails/4.jpg)
PROBLEM DEFINITION
FPGAs are architectures with fine grain packing[bit level granularity]So flexibility reduces , complexity increases.
Hence CGRA comes to picture. CGRA has WORD /NIBBLE level granularity. But CGRA applications are domain specific. It Supports only
integer arithmetic. Proposing a new architecture supporting both integer and
floating point operations.
![Page 5: U.maheswaran Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061204/547faa805906b50d298b45fb/html5/thumbnails/5.jpg)
TECHNICAL BACKGROUND
In current scenario systems with Reconfigurable logic modules, have a greater impact on many technical applications.
FPGA’s are used in wide range in many technical domains to implement many interesting complex algorithms. But these FPGAs have less flexibility and give less efficiency, due to their fine grained architecture.
![Page 6: U.maheswaran Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061204/547faa805906b50d298b45fb/html5/thumbnails/6.jpg)
CONTINUED.. If we use such fine grained architecture for
complex algorithms, the flexibility has to be sacrificed and the system becomes more complex.
COARSE GRAINED ARCHITECTURES has greater granularity, where the divided resource entities(hardware/problem) are larger grain size
![Page 7: U.maheswaran Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061204/547faa805906b50d298b45fb/html5/thumbnails/7.jpg)
CONTINUED..
Hence on using this CGRA for complex algorithms the flexibility is preserved.
But typical CGRA’s comes without floating point unit Also they are domain specific.
To overcome these barriers, heuristic mapping functions are used .so that a floating point unit can be dynamically created by the mapping algorithm.
![Page 8: U.maheswaran Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061204/547faa805906b50d298b45fb/html5/thumbnails/8.jpg)
CONTINUED..
Hence this floating point enabled CGRA can be used for complex applications involving floating point arithmetic.
E.g.: DSP filter design, Graphics accelerators, and many multimedia applications.
Thus the hardware flexibility of a system is improved by using an High performance Hardware and the programming flexibility is achieved through mapping algorithms .
![Page 9: U.maheswaran Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061204/547faa805906b50d298b45fb/html5/thumbnails/9.jpg)
PROPOSED IDEA
The real challenge before us now is the grain size of the reconfigurable device.
By grouping the basic units of the Reconfigurable device with a data bus of a particular data width, and thus by improving its granularity is the aim of these authors.
![Page 10: U.maheswaran Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061204/547faa805906b50d298b45fb/html5/thumbnails/10.jpg)
EXISTING MODEL- FPGA Existing FPGAs support
algorithms based on integer arithmetic
The FUs can execute common word-level operations.
Register files for each computing module is localized.
No shared bus communication among reconfigurable computing modules
![Page 11: U.maheswaran Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061204/547faa805906b50d298b45fb/html5/thumbnails/11.jpg)
MAPPED CGRA WITH FLOATING POINT SUPPORT
![Page 12: U.maheswaran Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061204/547faa805906b50d298b45fb/html5/thumbnails/12.jpg)
CONTINUED.. The target architecture consists of a
reconfigurable computing module (RCM). RCM executes loop kernel code segments. A general-purpose processor for controlling the
RCM is present. These units are connected with a shared bus. Each PE can be dynamically reconfigured to
perform arithmetic/ logic operation.
![Page 13: U.maheswaran Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061204/547faa805906b50d298b45fb/html5/thumbnails/13.jpg)
DESIGN FLOW
![Page 14: U.maheswaran Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061204/547faa805906b50d298b45fb/html5/thumbnails/14.jpg)
DATA PATH & CONTROL PATH DESIGN
Any ASIC is typically implemented with FSMD .
It consists of control path and data path designs.
Control path design: Generation of set of control
signals, [control word] at every clock cycle.
Data path design: Computational tasks described by
control word
![Page 15: U.maheswaran Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061204/547faa805906b50d298b45fb/html5/thumbnails/15.jpg)
RECONFIGURING THE TARGET ARCHITECTURE
The mapped CGRA contains a co-processor[kcpsm3 Pico-Blaze] on the host FPGA ,used for the reconfiguration of grain size of FPGA.
The kcpsm-3 [Constant(K) Coded Programmable State Machine] is a free soft processor cores from Xilinx for use in their FPGA .
.
![Page 16: U.maheswaran Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061204/547faa805906b50d298b45fb/html5/thumbnails/16.jpg)
CONTINUED..
Xilinx documents the Pico-Blaze as requiring just 96 FPGA slices.
It runs kernel in looping fashion , and reconfigures the CLBs in to required PEs.
Reconfiguration details are stored in configurable caches. Now the floating point adder unit is synthesized on the mapped CGRA and addition is performed.
![Page 17: U.maheswaran Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061204/547faa805906b50d298b45fb/html5/thumbnails/17.jpg)
DESIGN ASPECTS
![Page 18: U.maheswaran Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061204/547faa805906b50d298b45fb/html5/thumbnails/18.jpg)
RESULTING PE OF MAPPED CGRA
![Page 19: U.maheswaran Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061204/547faa805906b50d298b45fb/html5/thumbnails/19.jpg)
FLOATING POINT OPERATIONS WITH MAPPED CGRA
A pair of PEs used for floating point operations. One PE computes Mantissa & another handles Exponent.
Steiner tree routing is preferred for faster routing performance. After ILP/QEA , heuristic approaches are followed to increase performance.
![Page 20: U.maheswaran Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061204/547faa805906b50d298b45fb/html5/thumbnails/20.jpg)
CONTINUED
Thus, each operation in a loop body is spatially mapped to a dedicated PE.
The main advantage of spatial mapping is that each PE may not need reconfiguration during execution of a loop because of its fixed functionality.
However, it has a disadvantage that spreading all operations of the loop body over the reconfigurable array may require a very large array size.
![Page 21: U.maheswaran Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061204/547faa805906b50d298b45fb/html5/thumbnails/21.jpg)
CONTINUEDThe operations that a PE (or a pair of PEs) in our
CGRA can execute are classified into three groups.1) Arithmetic/logical operationsA PE can execute ALU operations in one clock cycle . 2) Multiply/divide/load/store operations These operations are executed by dedicated
functional resources located outside the PE array in several clock cycles.
3) Floating-point operationsA pair of PEs can execute floating-point operations
taking several clock cycles.
![Page 22: U.maheswaran Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061204/547faa805906b50d298b45fb/html5/thumbnails/22.jpg)
CONCLUSION
Thus Increased performance, Flexibility is achieved in both programming and Hardware by this mapping process over a reconfigurable device.
A faster, more flexible reconfigurable hardware is mapped to support floating point operations, in this way, can draw good attention in embedded systems industries.
~
![Page 23: U.maheswaran Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061204/547faa805906b50d298b45fb/html5/thumbnails/23.jpg)
Queries
?
![Page 24: U.maheswaran Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061204/547faa805906b50d298b45fb/html5/thumbnails/24.jpg)
AUTHORS
P.Venugopal M.E.[A.E],M.B.A.
Asst.Professor,Dept.of ECE,
M.N.M.Jain Engineering college,
Chennai,Tamilnadu,India
[email protected]+91-9444420128
U.Maheswaran.,M.E.[A.E]
PG Scholar,
M.N.M.Jain Engineering college,
Chennai,Tamilnadu,India
[email protected]+91-9944215357
![Page 25: U.maheswaran Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061204/547faa805906b50d298b45fb/html5/thumbnails/25.jpg)