ENG6530 Reconfigurable Computing Systems Paper Review Paper ReviewSummary.

25
ENG6530 Reconfigurable Computing Systems Paper Review Paper Review Summary Summary

Transcript of ENG6530 Reconfigurable Computing Systems Paper Review Paper ReviewSummary.

Page 1: ENG6530 Reconfigurable Computing Systems Paper Review Paper ReviewSummary.

ENG6530 Reconfigurable

Computing Systems

Paper ReviewPaper Review

SummarySummary

Page 2: ENG6530 Reconfigurable Computing Systems Paper Review Paper ReviewSummary.

ENG6530 RCS 2

Topics Paper Review, Topics CoveredPaper Review, Topics Covered

20142014 Conclusion??Conclusion??

When to use RCS?When to use RCS? How to use RCS?How to use RCS? SummarySummary

Page 3: ENG6530 Reconfigurable Computing Systems Paper Review Paper ReviewSummary.

ENG6530 RCS 3

References

“A Decade of Reconfigurable Computing: A Visionary Retrospective”, R. Hartenstein, 2001.

“Paper Review”, ENG6530/ENG3050 Web Site.

Page 4: ENG6530 Reconfigurable Computing Systems Paper Review Paper ReviewSummary.

ENG6530 RCS 4

Paper Review1. AdaBoost-Based Real Time Object Detection

• By Ziad Abouwaimer

2. Accelerating LS-SVM with RTR • By Yin Li

3. Reconfigurable Computing using Content Addressable Memory• By Marie, Anderson, Raphael

4. Coarse Grain Reconfigurable Architectures • By Shane, Zack, Cristian

5. AES Implementation on FPGAs • By Daniel, Taras, Natalia

6. FPGA Bases String Matching for NP • By Justin, Albert

7. Instance Specific Accelerators for Min Cover • By Desmond, Matt

8. High Performance Pipelined FPGA GA • By Grayden, Ganga

Page 5: ENG6530 Reconfigurable Computing Systems Paper Review Paper ReviewSummary.

ENG6530 RCS 5

Object detection is an important step in multiple applications. Real Time Detection Real Time Detection is critical in several domains. ParallelismParallelism can be easily exploited easily exploited from the application. The architecture proposed is flexible, scalableflexible, scalable. The architecture proposed can be extended can be extended to many other applications.

Flexible Parallel Hardware Architecture for AdaBoost-Based Flexible Parallel Hardware Architecture for AdaBoost-Based Real-Time Object DetectionReal-Time Object Detection

An Article Review by Ziad

Page 6: ENG6530 Reconfigurable Computing Systems Paper Review Paper ReviewSummary.

ENG6530 RCS 6

Machine Learning is an important tool to solve many problems. SVM are one of the most successful techniques with high accuracy. LS-SVM is a modified version is a modified version of SVM (Quadratic Optimization problem) TrainingTraining Machine Learning algorithms is a bottleneckis a bottleneck Real Time performance Real Time performance is crucial for many applications such as: driver

assistant applications, Laser Guided Missiles, UAV, WSN ... Dynamic Run Time Reconfiguration is not necessary!!

Accelerating On-line Training of LS-SVM with Run Time ReconfigurationAccelerating On-line Training of LS-SVM with Run Time Reconfiguration

An Article Review by Yin Li

Page 7: ENG6530 Reconfigurable Computing Systems Paper Review Paper ReviewSummary.

ENG6530 RCS 7

Coarse Grain FPGA were introduced as an alternative solution to fine grain FPGAs by providing multiple bit wide data paths and complex operators

Direct functional units functional units instead of LUT implementation Massive reduction Massive reduction of configuration time. Drastic complexity reduction Drastic complexity reduction of the P&R problem They are suitable for some applications some applications (communication protocols) They are not as flexible are not as flexible as fine/medium grain FPGAs

Coarse Grain FPGA ArchitecturesCoarse Grain FPGA Architectures

An Article Review by Shane, Zack and Cristian

Page 8: ENG6530 Reconfigurable Computing Systems Paper Review Paper ReviewSummary.

ENG6530 RCS 8

This approach deviates from conventional FPGA architectures (LUTS) to avoid design overhead.

The proposed RC architecture is a Memory based methodology that uses CAM as the underlying reconfigurable fabric (speed)(speed)

The use of CAM leads to significant reduction in memory requirement compared to LUT based approach.

However, CAM suffers form high power consumption!!high power consumption!! Any commercial implementations?? Why?

Reconfigurable Computing using Content Addressable Memory for Reconfigurable Computing using Content Addressable Memory for Improved Performance and Resource UsageImproved Performance and Resource Usage

An Article Review by Marie, Anderson and Raphael

Page 9: ENG6530 Reconfigurable Computing Systems Paper Review Paper ReviewSummary.

ENG6530 RCS 9

In the information age, cryptography has become one of the major methods for protection in all applications.

Cryptographic algorithms are used in embedded systemsembedded systems, desk tops, smart cards, wireless sensor networks, …

For superior and real time real time performance many applications require to realize cryptography algorithms in hardware.

Lots of parallelism can be exploited in the AES algorithm. Fine grain operations are performed Fine grain FPGAs

AES Implementation on FPGAsAES Implementation on FPGAs

An Article Review by Daniel, Taras and Natalia

Page 10: ENG6530 Reconfigurable Computing Systems Paper Review Paper ReviewSummary.

ENG6530 RCS 10

String Matching is a very popular technique that is used in many many applicationsapplications such as packet classification, packet inspection, Bioinformatics, DNA sequences, search engines …..

Building hardware accelerators for string matching is important since current packet classification and inspection running on GP are slowslow.

Content Addressable MemoryContent Addressable Memory are fast but they are very limited very limited and consume lots of powerconsume lots of power.

FPGA Based String Matching for Network Processing ApplicationsFPGA Based String Matching for Network Processing Applications

An Article Review by Justin and Albert

Page 11: ENG6530 Reconfigurable Computing Systems Paper Review Paper ReviewSummary.

ENG6530 RCS 11

Set covering is an important problem that can be used to solve many other applications crew scheduling, vertex covering, facility location, SAT

Speeding-up set covering is crucial (NP-hard problem) Significant raw speedup up to 5 orders of magnitude (small problems!!) Scalability might be an issue.

Instance-Specific Accelerators for Minimum CoveringInstance-Specific Accelerators for Minimum Covering

An Article Review by Desmond and Matt

Page 12: ENG6530 Reconfigurable Computing Systems Paper Review Paper ReviewSummary.

ENG6530 RCS 12

Meta-heuristic that is population basedpopulation based rather than single point based. We can exploit parallelism easily from the application. In addition to parallelism, pipelining can also be applied (Population

Initialization, Selection, Reproduction, Evaluation, Replacement) Can be used in many applications many applications from Robot Path Planning, solving

classical optimization problems (set covering) to Protein folding.

A high-Performance, Pipelined FPGA Based Genetic Algorithm A high-Performance, Pipelined FPGA Based Genetic Algorithm MachineMachine

An Article Review by Ganga, Grayden

Page 13: ENG6530 Reconfigurable Computing Systems Paper Review Paper ReviewSummary.

ENG6530 RCS 13

Lessons LearntLessons Learnt RCS is a trade-off between traditional hardware (performance)

and software (flexibility) Orders of magnitude performance improvements over Software

traditional systems. Pipelining and SIMD/MIMD type architectures can enhance the

performance of applications. Embedded Systems/Real Time requirements benefit from RCS. Algorithms might need to be modified to be mapped to hardware. Scalability is an issue that needs to be addressed. Input/output throughput might be an issue even when an

application can achieve orders of magnitude of speedup. The type of reconfigurable platform (fine/medium/coarse) plays an

important role in achieving a specific performance. Programming can be achieved at different levels of abstraction

(HDL vs. C/Matlab)

Page 14: ENG6530 Reconfigurable Computing Systems Paper Review Paper ReviewSummary.

ENG6530 RCS 14

Steps to use RCS

Page 15: ENG6530 Reconfigurable Computing Systems Paper Review Paper ReviewSummary.

ENG6530 RCS 15

ReconfigurableReconfigurable    SystemSystem(( CustomCustom    ComputingComputing    MachinMachinee ))

Not all applications are suitable for Reconfigurable Computing. Applications that involve extensive recursionextensive recursion, for example, are a

poor match because the synthesized “hardware” must be of fixed size.

Applications that have only a small percentage of parallelismsmall percentage of parallelism (1-5%) will not make advantage of RCS.

Applications that are I/O boundI/O bound will also suffer due to memory I/O transfer

Applications that require floating pointrequire floating point arithmetic The first requirement in exploiting RC for HPC

applications is determining if your applicationdetermining if your application is well suited to acceleration.

Page 16: ENG6530 Reconfigurable Computing Systems Paper Review Paper ReviewSummary.

ENG6530 RCS 16

ConsiderationsConsiderations PerformancePerformance:

Profiling to decide partitioning. Which bottlenecks will yield most performance Amdahl’s law Memory access and contention I/O bound versus CPU bound applications Synchronization between hardware/software

Consumption of hardware resourcesConsumption of hardware resources: Serial or fully parallel implementation Floating Point or Fixed point Using a single large FPGA (cost) Using several FPGAs (partitioning of application) Utilizing Run Time Reconfiguration Fine Grain, Medium Grain, or Course Grain?

Flexibility and EvaluationFlexibility and Evaluation: Hardware Descriptive Languages Electronic System Level and Design Exploration

Page 17: ENG6530 Reconfigurable Computing Systems Paper Review Paper ReviewSummary.

ENG6530 RCS 17

Steps of Mapping Applications to RCSSteps of Mapping Applications to RCS1. Selecting the most appropriate algorithm

Complexity O(n2) or O(n log n), NP-Complete ..

2. Using the most appropriate language (C, C++, Matlab) Run the algorithm on a GPP (golden reference model)

3. Profiling (hot spots, bottlenecks, ..) Pure hardware or H/S co-design.?

ASIP? Soft Core (Microblaze), Hard Core (Power PC), ARM (Zync)

If pure hardware, what type of coupling? Using Amdahl’s Law (how much speedup?)

4. Compile Time Reconfiguration or Dynamic Time Reconfiguration?o Local or Global RTR?

5. Using the appropriate HDL language (VHDL, Verilog, System Generator, ESL)

6. Using the most appropriate platform: (Altera, Xilinx, ``Spartan 6, Virtex 5, 6, 7”..) Fine Grain, Medium Grain or Coarse Grain Platform?

Page 18: ENG6530 Reconfigurable Computing Systems Paper Review Paper ReviewSummary.

ENG6530 RCS 18

1. Most Appropriate Algorithm?1. Most Appropriate Algorithm?

Selecting the most appropriate algorithm (complexity O(n2) or O(n log n), NP-Complete ..) An O(n2) algorithm might be more appropriate than an

O(n log n) since you might be able to exploit parallelism from the former.

It might be easier to map an O(n2) algorithm into hardware than a more efficient O(n) or O(n log n)

O(n log n) might be memory hungry and requires lots of access to memory.

O(n) algorithm might require lots of resources that are not available on the target FPGA

Page 19: ENG6530 Reconfigurable Computing Systems Paper Review Paper ReviewSummary.

ENG6530 RCS 19

2. Most Appropriate Language?2. Most Appropriate Language? C,

appropriate for embedded application Fast Compact

C++, Reuse Portable Less used in embedded applications

Matlab, used extensively by hardware designers Easy to use Tool boxes available Speed of implementation Slow

Page 20: ENG6530 Reconfigurable Computing Systems Paper Review Paper ReviewSummary.

ENG6530 RCS 20

3. Profiling and Bottlenecks3. Profiling and Bottlenecks

Using an appropriate profiler preferably associated with the CAD tool used.

Are the hotspots and bottlenecks found: easily translated to hardware.

Communication plays an important role. What type of coupling (functional unit, co-processor)? Type of processor used (Soft, Hard, External, ..) Metrics to be measured (importance?)

Area Power consumption Speed

Page 21: ENG6530 Reconfigurable Computing Systems Paper Review Paper ReviewSummary.

ENG6530 RCS 21

4. Compile Time or Run Time Reconfiguration4. Compile Time or Run Time Reconfiguration

Compile time Reconfiguration: If designers seek an easy flexible methodology If the FPGA can accommodate his/her design If space is an issue then seek:

Serial or semi parallel versus fully parallel Use H/S co-design

Run Time Reconfiguration: Designers will expect to spend much more time to partition their

applications (benefits power, area) Temporal Partitioning (Static Partial Reconfiguration) Spatial Partitioning (Dynamic Partial Reconfiguration)

Some type of Operating System (Manager, Scheduling) is needed. Speed of reconfiguration should be fast (ICAP Design and other ..) Issues: Limited Support, No simulation nor verification is available

Page 22: ENG6530 Reconfigurable Computing Systems Paper Review Paper ReviewSummary.

ENG6530 RCS 22

5. Design Entry5. Design Entry

Hardware Descriptive Languages (VHDL, Verilog): If designers seek a near optimal design in terms of performance and area

then an HDL is a must! Designers will expect to spend quite a bit of time implementing their

design. Other tools that can help reduce time:

System Generator Core Generator

Electronic System Level (HandelC, Vivado HLS): Allows designers to perform “Design Space Exploration” more easily than

HDLs. Exploiting parallelism via AutoESL or Handel-c would require verification

and adding pragma or directives. Some ESL allow for Hardware Software co-design Performance will suffer compared to HDL based implementation.

Page 23: ENG6530 Reconfigurable Computing Systems Paper Review Paper ReviewSummary.

ENG6530 RCS 23

6. Platform6. Platform

Fine/Medium Grain FPGA (Altera, Xilinx) The most straight forward way of mapping an application is to use current

Xilinx and Altera (Fine/Medium Grain) FPGAs. Advantage (CAD tools are available)Advantage (CAD tools are available)

Verification is easy (Xilinx ISIM, Mentor Graphics Modelsim) Hardware in the loop Verification (System Generator)

Disadvantages:Disadvantages: Long times to compile and recompile Some degree of difficulty to use Dynamic Run Time Reconfiguration

Coarse Grain FPGA: If the application requires higher performance and low power

consumption then Coarse Grain FPGAs are the route to go. Disadvantages:Disadvantages:

Not too many platforms available CAD Tools are limited.

Page 24: ENG6530 Reconfigurable Computing Systems Paper Review Paper ReviewSummary.

ENG6530 RCS 24

SummarySummary Not all applications Not all applications are suitable for Reconfigurable Computing. The first requirement in exploiting RC for HPC applications is

determining if the application is suited determining if the application is suited to acceleration. Reconfigurable computing can be well suited for several

applications such as Artificial Neural Networks, Genetic Algorithms, Cryptography, Image Processing, Simulation, Optimization, e.t.c., due to the parallelism that can be exploited in such applications.

Several steps have to be followed Several steps have to be followed carefully by the designer to ensure that they achieve their goals from implementing algorithms on Reconfigurable Computing (Tedious!Tedious!!).

There is a need for Design Exploration Design Exploration to shorten the amount of time taken by designers to target RCS.

Page 25: ENG6530 Reconfigurable Computing Systems Paper Review Paper ReviewSummary.

25