1 Presenter: Ming-Shiun Yang 2013/01/21 SAGA : SystemC Acceleration on GPU Architectures Design...

Post on 18-Jan-2016

212 views 0 download

Transcript of 1 Presenter: Ming-Shiun Yang 2013/01/21 SAGA : SystemC Acceleration on GPU Architectures Design...

1

Presenter: Ming-Shiun Yang

National Sun Yat-sen University

Embedded System Laboratory

2013/01/21

SAGA : SystemC Acceleration on GPU Architectures

Design Automation Conference (DAC), 2012 49th ACM/EDAC/IEEESara Vinco(Italy), Debapriya Chatterjee(USA), Valeria Bertacco(USA), Franco Fummi(Italy)

SystemC is a widespread language for HW/SW system simulation and design exploration, and thus a key development platform in embedded system design. However, the growing complexity of SoC designs is having an impact on simulation performance, leading to limited SoC exploration potential, which in turns affects development and verification schedules and time-to-market for new designs. Previous efforts have attempted to parallelize SystemC simulation, targeting both multiprocessors and GPUs. However, for practical designs, those approached fall far short of satisfactory performance. This paper proposes SAGA, a novel simulation approach that fully exploits the intrinsic parallelism of RTL SystemC descriptions, targeting GPU platforms. By limiting synchronization events with ad-hoc static scheduling and separate independent dataflows, we shows that we can simulate complex SystemC descriptions up to 16 times faster than traditional simulators.

Abstract

2

Original SystemC simulation Use scheduler to dispatch all processes to one core. Sequential processing. The growing complexity of SoC designs is having

impact on simulation performance.

What’s the problem

3

Related Works

4

[1,4,9,10] Parallel SystemC

Environment

[7]CUDA Programming Guide

[3]HIFSuite

Mapping SystemC to CUDAGeneral purpose programming interface

This Paper

Heavy overheadCode modification

Compute Unified Device Architecture (CUDA)

An interface is proposed to GP-GPU programming

GPU is a co-processor capable of executing many threads in parallel

NVIDIA CUDA Architecture

5

HIFSuite:

sc2hif

HIFSuite:hif2C

C fileHIF fileSystemC CUDA

Mapping SystemC to CUDA :

SAGA Exploit scheduling to eliminate the need of frequent

synchronization. Carve independent dataflows and then mapped to distinct threads

and processors. (Parallel execution)

Proposed Method

6 Traditional Simulator Proposed Simulator (SAGA)

HIFSuite:

sc2hif SAGA

HIFSuite:hif2C

C fileHIF filemodified

HIF fileSystemCCUDA

SAGA methodology – Steps 1. Construction the dependency graph

7

SAGA methodology –Step 2 : Partitioning into concurrent dataflows

8

Example – step 2

9

P8

Queue

P8

Current dataflow list

Queue ≠ Empty, pop P8

Queue

P8

Current dataflow list

P6 P7 P6

Queue

P8

Current dataflow list

P2 P6

Queue ≠ Empty, pop P6

Queue

P8

Current dataflow list

P1 P2 P6 P7P3

P1P7

Queue ≠ Empty, pop P7

P7

P4

Current dataflow list

P8 P6 P7 P1 P2 P3 P4

10

SAGA methodology –Step 3 : Process levelization and scheduling

Example – step 3

11

0 0 0 0 0 0 0

1. Set all leaf nodes to 0 level

2. Set all non-leaf nodes to -1 level3. if parent level < child level, parent level = child level +1 ex. P6’s level < P1’s level P6’s level = 0+1 =1 …

-1 -1 -1

-1 -1

1 1 1

2 2

Experimental setup

12

Column 3 : loc – line of codesColumn 4 : Dataflows (#) – partition number of dataflows in step 2.Column 5 : Replicated processes / the maxmum amount of replication for

these process

SAGA Performance and Speedup

13

16 times faster than traditional SystemC simulator.

Costs of Compilation

14

HIFSuite : A set of tools and APIs that provide support for modeling and verification of HW/SW system.

HIFSuite:

sc2hif SAGA

HIFSuite:hif2C

C fileHIF filemodified

HIF fileSystemCCUDA

Proposed a parallel schedule method for SystemC simulator.

A novel partitioning technique to carve independent dataflows mapped to distinct threads and multi-processors.

Conclusion

15

The time of translating SystemC to CUDA by HIFSuite is so long.

。They expect that a mature version could operate directly on SystemC source code (future work)

This paper is good illustrate clearly Experiment result

。Achieve their goal (reduce the simulation time) 。Many analysis。Compare with other works

My common

16