U.maheswaran Presentation

25
CGRA WITH FLOATING POINT SUPPORT FOR SIGNED ALU OPERATIONS MAPPED CGRA FOR FLOATING POINT OPERATIONS U.Maheswaran.,M.E. [A.E] PG Scholar, M.N.M.Jain Engineering college, Chennai,Tamilnadu,Indi P.VenugopalM.E. [A.E],M.B.A. Asst.Professor,Dept.of ECE, M.N.M.Jain Engineering college, Submitted by

Transcript of U.maheswaran Presentation

Page 1: U.maheswaran Presentation

CGRA WITH FLOATING POINT SUPPORT FOR SIGNED ALU

OPERATIONS MAPPED CGRA FOR FLOATING POINT OPERATIONS

U.Maheswaran.,M.E.[A.E]

PG Scholar,M.N.M.Jain Engineering college,

Chennai,Tamilnadu,India

P.VenugopalM.E.[A.E],M.B.A.

Asst.Professor,Dept.of ECE,

M.N.M.Jain Engineering college,

Chennai,Tamilnadu,India

Submitted by

Page 2: U.maheswaran Presentation

CONTENTS:

Introduction Problem Definition Technical Background Proposed Idea Design Methodology Design Aspects Results Conclusions Queries

Page 3: U.maheswaran Presentation

INTRODUCTION

Why reconfigurable computing?

What are CGRA s?

Why not FPGAs for Complex applications?

Reconfigurable computing & Mapping applications?

Page 4: U.maheswaran Presentation

PROBLEM DEFINITION

FPGAs are architectures with fine grain packing[bit level granularity]So flexibility reduces , complexity increases.

Hence CGRA comes to picture. CGRA has WORD /NIBBLE level granularity. But CGRA applications are domain specific. It Supports only

integer arithmetic. Proposing a new architecture supporting both integer and

floating point operations.

Page 5: U.maheswaran Presentation

TECHNICAL BACKGROUND

In current scenario systems with Reconfigurable logic modules, have a greater impact on many technical applications.

FPGA’s are used in wide range in many technical domains to implement many interesting complex algorithms. But these FPGAs have less flexibility and give less efficiency, due to their fine grained architecture.

Page 6: U.maheswaran Presentation

CONTINUED.. If we use such fine grained architecture for

complex algorithms, the flexibility has to be sacrificed and the system becomes more complex.

COARSE GRAINED ARCHITECTURES has greater granularity, where the divided resource entities(hardware/problem) are larger grain size

Page 7: U.maheswaran Presentation

CONTINUED..

Hence on using this CGRA for complex algorithms the flexibility is preserved.

But typical CGRA’s comes without floating point unit Also they are domain specific.

To overcome these barriers, heuristic mapping functions are used .so that a floating point unit can be dynamically created by the mapping algorithm.

Page 8: U.maheswaran Presentation

CONTINUED..

Hence this floating point enabled CGRA can be used for complex applications involving floating point arithmetic.

E.g.: DSP filter design, Graphics accelerators, and many multimedia applications.

Thus the hardware flexibility of a system is improved by using an High performance Hardware and the programming flexibility is achieved through mapping algorithms .

Page 9: U.maheswaran Presentation

PROPOSED IDEA

The real challenge before us now is the grain size of the reconfigurable device.

By grouping the basic units of the Reconfigurable device with a data bus of a particular data width, and thus by improving its granularity is the aim of these authors.

Page 10: U.maheswaran Presentation

EXISTING MODEL- FPGA Existing FPGAs support

algorithms based on integer arithmetic

The FUs can execute common word-level operations.

Register files for each computing module is localized.

No shared bus communication among reconfigurable computing modules

Page 11: U.maheswaran Presentation

MAPPED CGRA WITH FLOATING POINT SUPPORT

Page 12: U.maheswaran Presentation

CONTINUED.. The target architecture consists of a

reconfigurable computing module (RCM). RCM executes loop kernel code segments. A general-purpose processor for controlling the

RCM is present. These units are connected with a shared bus. Each PE can be dynamically reconfigured to

perform arithmetic/ logic operation.

Page 13: U.maheswaran Presentation

DESIGN FLOW

Page 14: U.maheswaran Presentation

DATA PATH & CONTROL PATH DESIGN

Any ASIC is typically implemented with FSMD .

It consists of control path and data path designs.

Control path design: Generation of set of control

signals, [control word] at every clock cycle.

Data path design: Computational tasks described by

control word

Page 15: U.maheswaran Presentation

RECONFIGURING THE TARGET ARCHITECTURE

The mapped CGRA contains a co-processor[kcpsm3 Pico-Blaze] on the host FPGA ,used for the reconfiguration of grain size of FPGA.

The kcpsm-3 [Constant(K) Coded Programmable State Machine] is a free soft processor cores from Xilinx for use in their FPGA .

.

Page 16: U.maheswaran Presentation

CONTINUED..

Xilinx documents the Pico-Blaze as requiring just 96 FPGA slices.

It runs kernel in looping fashion , and reconfigures the CLBs in to required PEs.

Reconfiguration details are stored in configurable caches. Now the floating point adder unit is synthesized on the mapped CGRA and addition is performed.

Page 17: U.maheswaran Presentation

DESIGN ASPECTS

Page 18: U.maheswaran Presentation

RESULTING PE OF MAPPED CGRA

Page 19: U.maheswaran Presentation

FLOATING POINT OPERATIONS WITH MAPPED CGRA

A pair of PEs used for floating point operations. One PE computes Mantissa & another handles Exponent.

Steiner tree routing is preferred for faster routing performance. After ILP/QEA , heuristic approaches are followed to increase performance.

Page 20: U.maheswaran Presentation

CONTINUED

Thus, each operation in a loop body is spatially mapped to a dedicated PE.

The main advantage of spatial mapping is that each PE may not need reconfiguration during execution of a loop because of its fixed functionality.

However, it has a disadvantage that spreading all operations of the loop body over the reconfigurable array may require a very large array size.

Page 21: U.maheswaran Presentation

CONTINUEDThe operations that a PE (or a pair of PEs) in our

CGRA can execute are classified into three groups.1) Arithmetic/logical operationsA PE can execute ALU operations in one clock cycle . 2) Multiply/divide/load/store operations These operations are executed by dedicated

functional resources located outside the PE array in several clock cycles.

3) Floating-point operationsA pair of PEs can execute floating-point operations

taking several clock cycles.

Page 22: U.maheswaran Presentation

CONCLUSION

Thus Increased performance, Flexibility is achieved in both programming and Hardware by this mapping process over a reconfigurable device.

A faster, more flexible reconfigurable hardware is mapped to support floating point operations, in this way, can draw good attention in embedded systems industries.

~

Page 23: U.maheswaran Presentation

Queries

?

Page 24: U.maheswaran Presentation

AUTHORS

P.Venugopal M.E.[A.E],M.B.A.

Asst.Professor,Dept.of ECE,

M.N.M.Jain Engineering college,

Chennai,Tamilnadu,India

[email protected]+91-9444420128

U.Maheswaran.,M.E.[A.E]

PG Scholar,

M.N.M.Jain Engineering college,

Chennai,Tamilnadu,India

[email protected]+91-9944215357

Page 25: U.maheswaran Presentation