Out-of-Order Parallel Simulation of SystemC Models...

23
Out-of-Order Parallel Simulation of SystemC Models on Many-Core Platforms UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 1 Out-of-Order Parallel Simulation of SystemC Models on Many-Core Platforms Presentation at UNSW, 2 March 2016 Rainer Dömer, Guantao Liu, Tim Schmidt {doemer,guantaol,schmidtt}@uci.edu Center for Embedded and Cyber-Physical Systems University of California, Irvine Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 2 Electronic System Level Models Abstract description of a complete system Hardware + Software Key Concepts in System Modeling Explicit Structure Block diagram structure Connectivity through ports Explicit Hierarchy System composed of components Explicit Concurrency Potential for parallel execution Potential for pipelined execution Explicit Communication and Computation Modules Channels and Interfaces Project Context: ESL Design B0 B1 B2 B3 SystemC Model

Transcript of Out-of-Order Parallel Simulation of SystemC Models...

Page 1: Out-of-Order Parallel Simulation of SystemC Models …doemer/publications/Talk_UNSW_20160302.pdf2016/03/02  · Breaks synchronization barrier! – Threads execute in parallel iff

Out-of-Order Parallel Simulation of SystemC Models on Many-Core Platforms

UNSW, 2 March 2016

(c) 2016 R. Doemer, CECS 1

Out-of-Order Parallel Simulationof SystemC Models

on Many-Core Platforms

Presentation at UNSW, 2 March 2016

Rainer Dömer, Guantao Liu, Tim Schmidt{doemer,guantaol,schmidtt}@uci.edu

Center for Embedded and Cyber-Physical Systems

University of California, Irvine

Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 2

• Electronic System Level Models– Abstract description of a complete system

– Hardware + Software

• Key Concepts in System Modeling – Explicit Structure

• Block diagram structure

• Connectivity through ports

– Explicit Hierarchy• System composed of components

– Explicit Concurrency• Potential for parallel execution

• Potential for pipelined execution

– Explicit Communication and Computation• Modules

• Channels and Interfaces

Project Context: ESL Design

B0 B1

B2 B3

SystemC Model

Page 2: Out-of-Order Parallel Simulation of SystemC Models …doemer/publications/Talk_UNSW_20160302.pdf2016/03/02  · Breaks synchronization barrier! – Threads execute in parallel iff

Out-of-Order Parallel Simulation of SystemC Models on Many-Core Platforms

UNSW, 2 March 2016

(c) 2016 R. Doemer, CECS 2

Project Context: ESL Design

• Model Validation through Simulation!– Efficient system-level simulation is critical

• Fast, and

• Accurate!

– Complexity of system models grows constantly• Need for speed!

• Parallel Simulation!– Parallelism explicitly specified in model

• System-level Description Language (SLDL)– SystemC [Groetker et. al, 2002]: SC_THREAD, SC_METHOD

– SpecC [Gajski et. al, 2000]: par { }, pipe { }

– Parallel processing available in standard PCs• Multi-core host PCs readily available

• Many-core technology is arriving

Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 3

Project Context: Related Work

Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 4

Discrete Event Simulation is slow

Modeling Techniques•Transaction-level modeling (TLM).•TLM temporal decoupling.•Savoiu et al. [MEMOCODE’05]•Razaghi et al.[ASPDAC’12]

Hardware-based Acceleration•Sirowy et al. [DAC’10]•Nanjundappa et al. [ASPDAC’10]•Sinha et al. [ASPDAC’12]

SMP Parallel Simulation•Fujimoto. [CACM’90]•Chopard et al. [ICCS’06]•Ezudheen et al. [PADS’09]•Mello et al. [DATE’10]•Schumacher et al. [CODES’11]•Chen et al. [IEEED&T’11]•Yun et al. [TCAD’12]

Distributed Simulation•Chandy et al. [TSE’79]•Huang et al. [SIES’08]•Chen et al. [CECS’11]

Page 3: Out-of-Order Parallel Simulation of SystemC Models …doemer/publications/Talk_UNSW_20160302.pdf2016/03/02  · Breaks synchronization barrier! – Threads execute in parallel iff

Out-of-Order Parallel Simulation of SystemC Models on Many-Core Platforms

UNSW, 2 March 2016

(c) 2016 R. Doemer, CECS 3

Project with Intel: Key Points

• Advanced Parallel SystemC Simulation– Out-of-Order PDES on many-core host platforms

– Maximum compliance with current execution semantics

– Support for parallel execution of virtual platforms

• Introduction of a Dedicated SystemC Compiler– Recoding Infrastructure for SystemC (RISC)

– Advanced static analysis for parallel execution

– Model instrumentation and code generation

• Parallel SystemC Core Library– Out-of-order parallel scheduler, multi-thread safe primitives

– Many-core target platform (e.g. Intel® Xeon Phi™)

• Open Source– Collaboration with Accellera SystemC Language WG

Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 5

Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 6

Outline

• Advanced Parallel SystemC Simulation– Traditional Discrete Event Simulation (DES)

– Parallel Discrete Event Simulation (PDES)

– Out-of-Order Parallel Discrete Event Simulation (OoO PDES)

• Project Overview– SystemC Compiler and Parallel Simulation Kernel

• Recoding Infrastructure for SystemC (RISC), Segment Graph

• Out-of-order parallel thread scheduling, many-core platforms

– Demo and Experimental Results• Embedded application: Conceptual DVD player

• Highly parallel application: Mandelbrot renderer

– Prototype Implementation• Open source alpha release available

• Concluding Remarks

Page 4: Out-of-Order Parallel Simulation of SystemC Models …doemer/publications/Talk_UNSW_20160302.pdf2016/03/02  · Breaks synchronization barrier! – Threads execute in parallel iff

Out-of-Order Parallel Simulation of SystemC Models on Many-Core Platforms

UNSW, 2 March 2016

(c) 2016 R. Doemer, CECS 4

Out-of-Order PDES Technology

SystemC Simulation must be Fast and Accurate!

• Traditional Discrete Event Simulation (DES)– Reference simulators run sequentially, only one thread at a time

(cooperative multi-threading model)

– Cannot utilize the capabilities of multi- or many-core hosts

• Parallel Discrete Event Simulation (PDES)– Threads run in parallel (if at the same delta cycle and time)

– Simulation-cycles are absolute barriers!

Out-of-order Parallel DE Simulation (OoO PDES)– Threads run in parallel and out-of-order [DATE’12, TCAD’14]

even in different delta and time cycles if there are no conflicts!

– Aggressive, runs maximum number of threads in parallel,but fully preserves DES semantics and model accuracy!

Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 7

Discrete Event Simulation (DES)

Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 8

10:010:1

10:2

20:120:0

20:2

30:0

0:0T:Δth4th2 th3th1• Traditional DES

– Concurrent threads of execution

– Managed by a central scheduler

– Driven by events and time advances• Delta-cycle

• Time-cycle

Partial temporal order with barriers

• Standard Simulator– SystemC reference simulator

uses cooperative multi-threading

A single thread is active at any time!

Cannot exploit parallelism

Cannot utilize multiple cores

Page 5: Out-of-Order Parallel Simulation of SystemC Models …doemer/publications/Talk_UNSW_20160302.pdf2016/03/02  · Breaks synchronization barrier! – Threads execute in parallel iff

Out-of-Order Parallel Simulation of SystemC Models on Many-Core Platforms

UNSW, 2 March 2016

(c) 2016 R. Doemer, CECS 5

Parallel Discrete Event Simulation (PDES)

Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 9

10:010:1

10:2

20:120:0

20:2

30:0

0:0T:Δth4th2 th3th1• Parallel DES

– Threads execute in parallel iff• in the same delta cycle, and

• in the same time cycle

Significant speed up!

– Synchronous PDES:Cycle boundaries areabsolute barriers!

• Aggressive Parallel DES– Conservative Approaches

• Careful static analysis prevents conflicts

– Optimistic Approaches• Conflicts are detected and addressed

(roll back)

Out-of-Order PDES Technology

Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 10

10:010:1

10:2

20:120:0

20:2

30:0

0:0T:Δth4th2 th3th1• Out-of-Order Parallel DES

Breaks synchronization barrier!

– Threads execute in parallel iff• in the same delta cycle, and

• in the same time cycle,

• OR if there are no conflicts!

Allows as many threads in parallelas possible

Significantly higher speedup!• Results at [DATE’12], [IEEE TCAD’14]

Advanced compiler fully preserves…DES execution semantics

Accuracy in results and timing

Page 6: Out-of-Order Parallel Simulation of SystemC Models …doemer/publications/Talk_UNSW_20160302.pdf2016/03/02  · Breaks synchronization barrier! – Threads execute in parallel iff

Out-of-Order Parallel Simulation of SystemC Models on Many-Core Platforms

UNSW, 2 March 2016

(c) 2016 R. Doemer, CECS 6

Out-of-Order PDES Technology

• OoO PDES Key Ideas1. Dedicated SystemC compiler with advanced model analysis

Static conflict analysis based on Segment Graphs

2. Parallel simulator with out-of-order scheduling on many cores Fast decision making at run-time, optimized mapping

• Fundamental Data Structure: Segment Graph– Key to semantics-compliant out-of-order execution [DATE’12]

– Key to prediction of future thread state [DATE’13]• “Optimized Out-of-Order Parallel DE Simulation Using Predictions”

– Key to May-Happen-in-Parallel Analysis [DATE’14]• “May-Happen-in-Parallel Analysis based on Segment Graphs

for Safe ESL Models“ (Best Paper Award)

– Journal publication: “OoO PDES for TLM” [IEEE TCAD’14]• Comprehensive article with HybridThreads extension

Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 11

Project Overview and Tool Flow

• Research and Development Tasks1) Dedicated SystemC compiler

(RISC compiler)

2) Parallel SystemCsimulator

3) Performance tuningfor many‐core hosts

4) Virtual Platform (VP)integration

5) Model analysis(may-happen-in-parallel, MHP)

6) Model recoding,transformationand optimization

Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 12

SystemCModel

SystemCCompiler

RISC

C++ Compiler

ParallelExecutable

ParallelC++ Model

ParallelSimulation

VPEngine

VirtualPlatformLibrary

VP-basedPrototyping

Many-CoreHost Platform

MHP Analysis

ToolsRISC

MHP Analysis

ToolsRISC

ModelAnalysis

ReportsReportsReports

MHP Analysis

ToolsRISC

RecodingTools

RISC

RecodingTools

RISC

RefinedSystemC

Model

ModelTransformation

andOptimization

RecodingTools

RISCParallel

SystemCLibrary

ParallelSystemCHeaders

Page 7: Out-of-Order Parallel Simulation of SystemC Models …doemer/publications/Talk_UNSW_20160302.pdf2016/03/02  · Breaks synchronization barrier! – Threads execute in parallel iff

Out-of-Order Parallel Simulation of SystemC Models on Many-Core Platforms

UNSW, 2 March 2016

(c) 2016 R. Doemer, CECS 7

Project Status after Year 1 of 3

• Research and Development Tasks CompletedY1 Dedicated SystemC compiler

(RISC compiler)

Y1 Parallel SystemCsimulator

Y1 Performance tuningfor many‐core hosts

Y2 Virtual Platform (VP)integration

Y3 Model analysis(may-happen-in-parallel, MHP)

Y3 Model recoding,transformationand optimization

Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 13

SystemCModel

SystemCCompiler

RISC

C++ Compiler

ParallelExecutable

ParallelC++ Model

ParallelSimulation

VPEngine

VirtualPlatformLibrary

VP-basedPrototyping

Many-CoreHost Platform

MHP Analysis

ToolsRISC

MHP Analysis

ToolsRISC

ModelAnalysis

ReportsReportsReports

MHP Analysis

ToolsRISC

RecodingTools

RISC

RecodingTools

RISC

RefinedSystemC

Model

ModelTransformation

andOptimization

RecodingTools

RISCParallel

SystemCLibrary

ParallelSystemCHeaders

Dedicated SystemC Compiler

• RISC Software Stack Recoding Infrastructure for SystemC

– C/C++ foundation

– ROSE compiler infrastructure

Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 14

RISC

C/C++ Foundation

ROSE IR

• ROSE Internal Representation

• Explicit support for

• Source code analysis

• Source-to-source transformations

Page 8: Out-of-Order Parallel Simulation of SystemC Models …doemer/publications/Talk_UNSW_20160302.pdf2016/03/02  · Breaks synchronization barrier! – Threads execute in parallel iff

Out-of-Order Parallel Simulation of SystemC Models on Many-Core Platforms

UNSW, 2 March 2016

(c) 2016 R. Doemer, CECS 8

Dedicated SystemC Compiler

• RISC Software Stack Recoding Infrastructure for SystemC

– SystemC InternalRepresentation

Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 15

RISC

C/C++ Foundation

ROSE IR

SystemC IR

• Class hierarchy to representSystemC objects

Dedicated SystemC Compiler

Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 16

RISC

Segment Graph

C/C++ Foundation

ROSE IR

SystemC IR

• RISC Software Stack Recoding Infrastructure for SystemC

1) Segment Graph construction

2) Segment conflict analysis

Compilation,Simulation

SystemC Compiler

RISCsystemc.h

Model.cpp

Segment GraphConstruction

Parallel AccessConflict Analysis

…Model

_par.cpp

SystemC Model ParallelC++ Model

Seg 2

Seg 1

Seg 3

Seg 4 Seg 5

Seg 6

Segment Graph

Step 1: Build a Segment Graph

Page 9: Out-of-Order Parallel Simulation of SystemC Models …doemer/publications/Talk_UNSW_20160302.pdf2016/03/02  · Breaks synchronization barrier! – Threads execute in parallel iff

Out-of-Order Parallel Simulation of SystemC Models on Many-Core Platforms

UNSW, 2 March 2016

(c) 2016 R. Doemer, CECS 9

Dedicated SystemC Compiler

• Segment Graph– Segment Graph is a directed graph

• Nodes: Segments

Code statements executedbetween two scheduling steps

– Expression statements

– Control flow statements (if, while, …)

– Function calls

• Edges: Segment boundaries

Primitives that trigger scheduler entry– wait(event)

– wait(time)

Segment Graph can be constructed staticallyby the compiler from the model source code

• (see example on next slide)

Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 17

Seg 2

Seg 1

Seg 3

Seg 4 Seg 5

Seg 6

Segment Graph

Dedicated SystemC Compiler

• Segment Graph Construction– Example: Source code and Segment Graph

Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 18

int a;

if(cond) {

int b;

wait(1);

int c;

} else {

int d;

}

int e;

wait(2);

int f;

while(cond) {

int g;

}

int h;

int a;conditionint b;int d;int e;

int c;int e;

int f;conditionint g;int h

Page 10: Out-of-Order Parallel Simulation of SystemC Models …doemer/publications/Talk_UNSW_20160302.pdf2016/03/02  · Breaks synchronization barrier! – Threads execute in parallel iff

Out-of-Order Parallel Simulation of SystemC Models on Many-Core Platforms

UNSW, 2 March 2016

(c) 2016 R. Doemer, CECS 10

Dedicated SystemC Compiler

• Segment Graph Construction:– Support for straight-line code

Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 19

void straight(){

x = 42;int xx = 43;int yy;yy;int o = y;

wait(10, SC_NS);

wait();

int kk;

wait();

int oo;}

Segment ID: 0

input_straight.cpp:24 (this) -> x = 42

input_straight.cpp:25 int xx = 43;

input_straight.cpp:26 int yy;

input_straight.cpp:27 yy

input_straight.cpp:28 int o =(this) -> y;

Segment ID: 1 (input_straight.cpp:30)

Segment ID: 2 (input_straight.cpp:32)

input_straight.cpp:34 int kk;

Segment ID: 3 (input_straight.cpp:37)

input_straight.cpp:39 int oo;

Dedicated SystemC Compiler

• Segment Graph Construction:– Support for conditional statements if, if-else, switch-case

(with break)

Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 20

void if_statement(){

wait();int aaa;if(test) {

int bbb;wait();int ccc;

}int ddd;wait();int eee;

}

Segment ID: 0

Segment ID: 1 (input_if_else.cpp:27)

compilerGenerated:0 (this) -> test

input_if_else.cpp:34 int ddd;

input_if_else.cpp:28 int aaa;

input_if_else.cpp:30 int bbb;

Segment ID: 2 (input_if_else.cpp:31)

input_if_else.cpp:32 int ccc;

input_if_else.cpp:34 int ddd;

Segment ID: 3 (input_if_else.cpp:35)

input_if_else.cpp:36 int eee;

Page 11: Out-of-Order Parallel Simulation of SystemC Models …doemer/publications/Talk_UNSW_20160302.pdf2016/03/02  · Breaks synchronization barrier! – Threads execute in parallel iff

Out-of-Order Parallel Simulation of SystemC Models on Many-Core Platforms

UNSW, 2 March 2016

(c) 2016 R. Doemer, CECS 11

Dedicated SystemC Compiler

• Segment Graph Construction:– Support for loop statements while, do-while, for

Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 21

void while_statement(){

wait();int kk;while(test) {

int aa;wait();int bb;

}int cc;wait();int dd;

}

Segment ID: 0

Segment ID: 1 (input_while.cpp:13)

input_while.cpp:14 int kk;

compilerGenerated:0 (this) -> test

input_while.cpp:16 int aa;

input_while.cpp:20 int cc;

Segment ID: 2 (input_while.cpp:17)

input_while.cpp:18 int bb;

compilerGenerated:0 (this) -> test

input_while.cpp:16 int aa;

input_while.cpp:20 int cc;

Segment ID: 3 (input_while.cpp:21)

input_while.cpp:22 int dd;

Dedicated SystemC Compiler

• Segment Graph Construction:– Support for loop statements while, do-while, for

(with break, continue)

Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 22

void while_continue_statement(){

int kk;while(test){

int aa;wait();int bb;if(test1) {

continue;}int oo;wait();int cc;

}int dd;wait();

} Segment ID: 3 (input_while_continue.cpp:62)

Segment ID: 0

input_while_continue.cpp:49 int kk;

compilerGenerated:0 (this) -> test

input_while_continue.cpp:51 int aa;

input_while_continue.cpp:61 int dd;

Segment ID: 2 (input_while_continue.cpp:58)

input_while_continue.cpp:59 int cc;

compilerGenerated:0 (this) ->; test

input_while_continue.cpp:51 int aa;

input_while_continue.cpp:61 int dd;

Segment ID: 1 (input_while_continue.cpp:52)

input_while_continue.cpp:53 int bb;

compilerGenerated:0 (this) -> test1

input_while_continue.cpp:55 continue;

input_while_continue.cpp:57 int oo;

compilerGenerated:0 (this) -> test

input_while_continue.cpp:51 int aa;

input_while_continue.cpp:61 int dd;

Page 12: Out-of-Order Parallel Simulation of SystemC Models …doemer/publications/Talk_UNSW_20160302.pdf2016/03/02  · Breaks synchronization barrier! – Threads execute in parallel iff

Out-of-Order Parallel Simulation of SystemC Models on Many-Core Platforms

UNSW, 2 March 2016

(c) 2016 R. Doemer, CECS 12

Dedicated SystemC Compiler

• Segment Graph Construction:– Support for function calls f(x), return

Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 23

int g1(){

int g_0;wait();int g_1 = 33;if(g_1 == 88) {

int g_2;wait();int g_3 = 44;return 43;int DEAD_CODE;

}int g_4;wait();int g_5;wait();int g_6;int return_value = 2;return return_value;

}

Segment ID: 1 (input_function_calls.cpp:152)

input_function_calls.cpp:153 int bb;

input_function_calls.cpp:154 (this) -> g1();

input_function_calls.cpp:162 int g_0;

Segment ID: 2 (input_function_calls.cpp:163)

input_function_calls.cpp:164 int g_1 = 33;

input_function_calls.cpp:166 g_1 == 88

input_function_calls.cpp:167 int g_2;

input_function_calls.cpp:173 int g_4;

Segment ID: 3 (input_function_calls.cpp:168)

input_function_calls.cpp:169 int g_3 = 44;

input_function_calls.cpp:170 43

input_function_calls.cpp:155 int cc;

Segment ID: 5 (input_function_calls.cpp:176)

input_function_calls.cpp:177 int g_6;

input_function_calls.cpp:178 int return_value = 2;

input_function_calls.cpp:179 return_value

input_function_calls.cpp:155 int cc;

Segment ID: 4 (input_function_calls.cpp:174)

input_function_calls.cpp:175 int g_5;

Segment ID: 6 (input_function_calls.cpp:156)

input_function_calls.cpp:157 int dd;

Segment ID: 0

input_function_calls.cpp:151 int aa;

void f(){

int aa;wait();int bb;g1();int cc;wait();int dd;

}

void main(){ wait();

f();wait();

}

Dedicated SystemC Compiler

• Segment Graph Construction:– Support for recursive function calls

Direct, indirect recursion

Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 24

void f(){ wait();

if(xx>0) {wait();g();wait();

}wait();return;

}

void g(){ xx--;

wait();if(xx>0) {

wait();int before_rec;f();int after_rec;wait();

} else {wait();return;

}}

Segment ID: 0

Segment ID: 7 (input_recursive.cpp:163)

Segment ID: 10 (input_recursive.cpp:153)

Segment ID: 9 (input_recursive.cpp:178)

Segment ID: 1 (input_recursive.cpp:151)

input_recursive.cpp:152 (this) -> recursive1();

Segment ID: 2 (input_recursive.cpp:159)

input_recursive.cpp:160 (this) -> xx > 0

Segment ID: 5 (input_recursive.cpp:180)

compilerGenerated:0

Segment ID: 6 (input_recursive.cpp:174)

input_recursive.cpp:175 int before_rec;

input_recursive.cpp:176 (this) -> recursive1()

Segment ID: 4 (input_recursive.cpp:172)

input_recursive.cpp:173 (this) -> xx > 0

Segment ID: 8 (input_recursive.cpp:165)

compilerGenerated:0

input_recursive.cpp:177 int after_rec;

Segment ID: 3 (input_recursive.cpp:161)

input_recursive.cpp:162 (this) -> recursive2();

input_recursive.cpp:171 (this) -> xx--

Page 13: Out-of-Order Parallel Simulation of SystemC Models …doemer/publications/Talk_UNSW_20160302.pdf2016/03/02  · Breaks synchronization barrier! – Threads execute in parallel iff

Out-of-Order Parallel Simulation of SystemC Models on Many-Core Platforms

UNSW, 2 March 2016

(c) 2016 R. Doemer, CECS 13

Step 2:Perform Conflict Analysis

Dedicated SystemC Compiler

Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 25

RISC

Segment Graph

C/C++ Foundation

ROSE IR

SystemC IR

• RISC Software Stack Recoding Infrastructure for SystemC

1) Segment Graph construction

2) Segment conflict analysis

Compilation,Simulation

SystemC Compiler

RISCsystemc.h

Model.cpp

Segment GraphConstruction

Parallel AccessConflict Analysis

…Model

_par.cpp

SystemC Model ParallelC++ Model

Seg 2

Seg 1

Seg 3

Seg 4 Seg 5

Seg 6

Segment Graph

Conflict Seg 1 Seg 2 Seg 3

Seg 1 True

Seg 2 True True

Seg 3 True

Seg 3R: a, bW: x, yRW:

Seg 2R: a, bW: xRW: z

Instrumentation!

Dedicated SystemC Compiler

• Segment Conflict Analysis– Need to comply with SystemC LRM [IEEE Std 1666™]

• Cooperative (or co-routine) multitasking semantics– “process instances execute without interruption”

– System designer “can assume that a method processwill execute in its entirety without interruption”

A parallel implementation “would be obligedto analyze any dependencies between processes andconstrain their execution to match the co-routine semantics.”

– Must avoid race conditions when using shared variables! Prevent conflicting segments to be scheduled in parallel

Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 26

Conflict Seg 1 Seg 2 Seg 3

Seg 1 True

Seg 2 True True

Seg 3 True

Seg 3R: a, bW: x, yRW:

Seg 2R: a, bW: xRW: z

Page 14: Out-of-Order Parallel Simulation of SystemC Models …doemer/publications/Talk_UNSW_20160302.pdf2016/03/02  · Breaks synchronization barrier! – Threads execute in parallel iff

Out-of-Order Parallel Simulation of SystemC Models on Many-Core Platforms

UNSW, 2 March 2016

(c) 2016 R. Doemer, CECS 14

Dedicated SystemC Compiler

• Segment Conflict Analysis:– Variable access analysis for Read, Write, and Read/Write

– Example:

Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 27

class Conflict: public sc_module {SC_CTOR(Conflict){ SC_THREAD(thread1);

SC_THREAD(thread2);}int x, y, z;

void thread1(){

int a;a = 2;wait();a = x + y;wait();z++;

}};

void thread2(){

int b = 2;x = y;wait();x = y * z;wait();z++;wait();x++;

}

Segment ID: 0

conflict.cpp:24 int a;

conflict.cpp:25 a = 2

Segment ID: 3

conflict.cpp:34 int b = 2;

conflict.cpp:35 x = y

Segment ID: 1 (conflict.cpp:26)

conflict.cpp:27 a = x + y

Segment ID: 2 (conflict.cpp:28)

conflict.cpp:29 z++

Segment ID: 4 (conflict.cpp:36)

conflict.cpp:37 x = y * z

Segment ID: 5 (conflict.cpp:38)

conflict.cpp:39 z++

Segment ID: 6 (conflict.cpp:40)

conflict.cpp:41 x++

Segment Graph

Segment ID: 0

conflict.cpp:24 int a;

conflict.cpp:25 a = 2

Segment ID: 3

conflict.cpp:34 int b = 2;

conflict.cpp:35 x = y

Segment ID: 1 (conflict.cpp:26)

conflict.cpp:27 a = x + y

Segment ID: 2 (conflict.cpp:28)

conflict.cpp:29 z++

Segment ID: 4 (conflict.cpp:36)

conflict.cpp:37 x = y * z

Segment ID: 5 (conflict.cpp:38)

conflict.cpp:39 z++

Segment ID: 6 (conflict.cpp:40)

conflict.cpp:41 x++

Dedicated SystemC Compiler

• Segment Conflict Analysis:– Variable access analysis for Read, Write, and Read/Write

– Example:

Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 28

Segment GraphSegmentVariable

Accesses

Page 15: Out-of-Order Parallel Simulation of SystemC Models …doemer/publications/Talk_UNSW_20160302.pdf2016/03/02  · Breaks synchronization barrier! – Threads execute in parallel iff

Out-of-Order Parallel Simulation of SystemC Models on Many-Core Platforms

UNSW, 2 March 2016

(c) 2016 R. Doemer, CECS 15

Dedicated SystemC Compiler

• Segment Conflict Analysis:– Variable access analysis for Read, Write, and Read/Write

– Example:

Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 29

SegmentVariable

AccessesSegment Data Conflict Table

x

x

SystemC Compiler and Simulator

• Compiler and Simulator work hand in hand!– Compiler performs conservative static analysis

– Analysis results are passed to the simulator

– Simulator can make safe scheduling decisions quickly

Automatic Model Instrumentation Static analysis results are inserted into the source code

Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 30

SystemC Simulator

C++Compiler

ParallelSimulation

SystemC Compiler

RISCsystemc.h

Model.cpp

… Source CodeInstrumentation

systemc_par.h

Model_par.cpp

Input Model

ParallelSystemCLibrary

ParallelC++ Model

Model Instrumentation:Segment and Instance IDsSegment Conflict Tables

Time Advance Tables

Page 16: Out-of-Order Parallel Simulation of SystemC Models …doemer/publications/Talk_UNSW_20160302.pdf2016/03/02  · Breaks synchronization barrier! – Threads execute in parallel iff

Out-of-Order Parallel Simulation of SystemC Models on Many-Core Platforms

UNSW, 2 March 2016

(c) 2016 R. Doemer, CECS 16

SystemC Compiler and Simulator

• Compiler and Simulator work hand in hand!– Compiler performs conservative static analysis

– Analysis results are passed to the simulator

– Simulator can make safe scheduling decisions quickly

Automatic Model Instrumentation1) Segment and instance IDs

• Threads identified by creator instance and current code location

2) Data and event conflict tables• Segment concurrency hazards identified by fast table lookup

(filtered for scope, instance path, references and port mapping)

3) Current and next time advance tables• Prediction of future thread states

better scheduling decisions by looking ahead in time(future optimization)

Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 31

• Simulator kernel with Out-of-Order Parallel Scheduler– Conceptual OoO PDES execution

Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 32

Parallel SystemC Simulator

IssueThreads

Issue threads…

• truly in parallel and out-of-order

• whenever they are ready

• and will have no conflicts!

Fast conflict table lookup

Smart thread-to-core mapping (future optimization)

Page 17: Out-of-Order Parallel Simulation of SystemC Models …doemer/publications/Talk_UNSW_20160302.pdf2016/03/02  · Breaks synchronization barrier! – Threads execute in parallel iff

Out-of-Order Parallel Simulation of SystemC Models on Many-Core Platforms

UNSW, 2 March 2016

(c) 2016 R. Doemer, CECS 17

Parallel SystemC Simulator

Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 33

• Protection of Inter-Thread Communication– Need to comply with SystemC LRM [IEEE Std 1666™]

• Cooperative (or co-routine) multitasking semantics– Threads can assume execution “without interruption”

– Must protect inter-thread communication in channels!• Primitive SystemC channels

Static protection (special parallel SystemC headers, library)

• User-defined hierarchical channels Dynamic protection through source code instrumentation

Channel

Thread 2Thread 1

SystemC Simulator

Demo and Experimental Results

• Interactive Demonstration– Two Application Examples

• DVD player (conceptual)

• Mandelbrot renderer (embarrassingly parallel)

– Compilation• Static analysis based on segment graph

• Conflict analysis and source code instrumentation

– Simulation• Accellera reference library (Posix-based, sequential)

• RISC simulator library (Posix-based, out-of-order parallel)

Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 34

ParallelSystemCLibrary

SystemC Compiler

RISCsystemc.h

Model.cpp

Segment GraphConflict Analysis

Source CodeInstrumentation

Input Model

ParallelSimulation

systemc_par.h

Model_par.cpp

Page 18: Out-of-Order Parallel Simulation of SystemC Models …doemer/publications/Talk_UNSW_20160302.pdf2016/03/02  · Breaks synchronization barrier! – Threads execute in parallel iff

Out-of-Order Parallel Simulation of SystemC Models on Many-Core Platforms

UNSW, 2 March 2016

(c) 2016 R. Doemer, CECS 18

Example Model 1: DVD Player

• DVD Player Example (conceptual)– Parallel video and audio decoding with different frame rates

Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 35

Video30 FPS

2 Audio Channels38.28 FPS

Multimediainput

stream

DUT

VideoCodec

LeftAudioCodec

RightAudioCodec

DUTVideo

MonitorLeft

SpeakerRight

Speaker

Stimulus

1: SC_MODULE(VideoCodec)2: { sc_port<i_receiver> p1;3:   sc_port<i_sender>    p2;4:   …5:   while(1){6:       p1‐>receive(&inFrm);7:       outFrm = decode(inFrm);8:       wait(33330, SC_US);9:       p2‐>send(outFrm);10:   }11: };

1: SC_MODULE(AudioCodec)2: { sc_port<i_receiver> p1;3:   sc_port<i_sender>    p2;4:   …5:   while(1){6:       p1‐>receive(&inFrm);7:       outFrm = decode(inFrm);8:       wait(26120, SC_US);9:       p2‐>send(outFrm);10:   }11: };

Example Model 1: DVD Player

• DVD Player Example (conceptual)– Parallel video and audio decoding

with different frame rates

1. Real time schedule: fully parallel

2. Reference simulator schedule (DES)

Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 36

0 26.12

33.33

52.25

66.67

78.38

100

Frame 1 Frame 2 Frame 3

LF 1RF 1

LF 2RF 2

LF 3RF 3

LF 4RF 4

Time [ms]

Video

LeftRight

Video

LeftRight

0

33.33 66.67

78.38

100

Frame 3Frame 1 Frame 2

LF 1RF 1

LF 2RF 2

LF 3RF 3

Time [ms] …52.2526.12

LF 4

DUT

VideoCodec

LeftAudioCodec

RightAudioCodec

DUTVideo

MonitorLeft

SpeakerRight

Speaker

Stimulus

Page 19: Out-of-Order Parallel Simulation of SystemC Models …doemer/publications/Talk_UNSW_20160302.pdf2016/03/02  · Breaks synchronization barrier! – Threads execute in parallel iff

Out-of-Order Parallel Simulation of SystemC Models on Many-Core Platforms

UNSW, 2 March 2016

(c) 2016 R. Doemer, CECS 19

Example Model 1: DVD Player

• DVD Player Example (conceptual)– Parallel video and audio decoding

with different frame rates

1. Real time schedule: fully parallel

3. Synchronous parallel schedule (PDES)

Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 37

0 26.12

33.33

52.25

66.67

78.38

100

Frame 1 Frame 2 Frame 3

LF 1RF 1

LF 2RF 2

LF 3RF 3

LF 4RF 4

Time [ms]

Video

LeftRight

0 26.12

33.33 66.67

78.38

100

Frame 1 Frame 2 Frame 3

LF 1RF 1

LF 2RF 2

LF 3RF 3

LF 4RF 4

Time [ms]

Video

LeftRight

52.25

DUT

VideoCodec

LeftAudioCodec

RightAudioCodec

DUTVideo

MonitorLeft

SpeakerRight

Speaker

Stimulus

Example Model 1: DVD Player

• DVD Player Example (conceptual)– Parallel video and audio decoding

with different frame rates

1. Real time schedule: fully parallel

4. Out-of-order parallel schedule (OoO PDES)

Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 38

0 26.12

33.33

52.25

66.67

78.38

100

Frame 1 Frame 2 Frame 3

LF 1RF 1

LF 2RF 2

LF 3RF 3

LF 4RF 4

Time [ms]

Video

LeftRight

0 26.12

33.33

52.25

66.67

78.38

100

Frame 1 Frame 2 Frame 3

LF 1RF 1

LF 2RF 2

LF 3RF 3

LF 4RF 4

Time [ms]

Video

LeftRight

DUT

VideoCodec

LeftAudioCodec

RightAudioCodec

DUTVideo

MonitorLeft

SpeakerRight

Speaker

Stimulus

Page 20: Out-of-Order Parallel Simulation of SystemC Models …doemer/publications/Talk_UNSW_20160302.pdf2016/03/02  · Breaks synchronization barrier! – Threads execute in parallel iff

Out-of-Order Parallel Simulation of SystemC Models on Many-Core Platforms

UNSW, 2 March 2016

(c) 2016 R. Doemer, CECS 20

Example Model 2: Mandelbrot

• Mandelbrot Renderer (Graphics Pipeline Application)– Mandelbrot Set

• Mathematical set of pointsin complex plane

– Two-dimensional fractal shape

• High computation load– Recursive/iterative function

• Embarrassingly parallel– Parallelism at pixel level

– SystemC Model• TLM abstraction

• Parallel slices

• Configurable

• Executable

Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 39

Top

Stimulus Monitor

Platform

DUT

din dout

Coordinator

M M M M

Example Model 2: Mandelbrot

• Mandelbrot Renderer (Graphics Pipeline Application) Simulated Graphics Demonstration

(when network delays prevent actual graphical demo)

Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 40

Page 21: Out-of-Order Parallel Simulation of SystemC Models …doemer/publications/Talk_UNSW_20160302.pdf2016/03/02  · Breaks synchronization barrier! – Threads execute in parallel iff

Out-of-Order Parallel Simulation of SystemC Models on Many-Core Platforms

UNSW, 2 March 2016

(c) 2016 R. Doemer, CECS 21

Experimental Results

• DVD Player Example (conceptual) Parallel video and audio decoding with different frame rates

– Simulator run times on Intel® Xeon® multi-core host (‘delta’)(1 E3-1240 CPU, 3.4 GHz, 4 cores, 2 way hyper-threaded)

– RISC V0.2.1, Posix-thread based comparison

Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 41

Seq Par OoO

10 secstream

Run Time 6.98 s 4.67 s 2.94 s

CPU Load 97% 145% 238%

Speedup 1 x 1.49 x 2.37 x

100 secstream

Run Time 68.21 s 45.91 s 28.13 s

CPU Load 100% 149% 251%

Speedup 1 x 1.49 x 2.42 x

Experimental Results

• Mandelbrot Renderer Example Graphics Pipeline Application, embarrassingly parallel

– Simulator run times on Intel® Xeon® multi-core host (‘phi’)(2 E5-2680 CPUs, 2.7 GHz, 8 cores, 2 way hyper-threaded)

– RISC V0.2.1, Posix-thread based comparison

Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 42

ParallelSlices

DES PDES OOO PDES

Run Time

CPU Load

Run Time

CPU Load

SpeedupRun Time

CPU Load

Speedup

1 162.13 s 99% 162.06 s 100% 1.00 x 161.90 s 100% 1.00 x2 162.19 s 99% 96.50 s 168% 1.68 x 96.48 s 168% 1.68 x4 162.56 s 99% 54.00 s 305% 3.01 x 53.85 s 304% 3.02 x8 163.10 s 99% 29.89 s 592% 5.46 x 30.05 s 589% 5.43 x

16 164.01 s 99% 19.03 s 1050% 8.62 x 20.08 s 997% 8.17 x32 165.89 s 99% 11.78 s 2082% 14.08 x 11.99 s 2023% 13.84 x64 170.32 s 99% 9.79 s 2607% 17.40 x 9.85 s 2608% 17.29 x

128 174.55 s 99% 9.34 s 2793% 18.69 x 9.39 s 2787% 18.59 x256 185.47 s 100% 8.91 s 2958% 20.82 x 8.90 s 2964% 20.84 x

Page 22: Out-of-Order Parallel Simulation of SystemC Models …doemer/publications/Talk_UNSW_20160302.pdf2016/03/02  · Breaks synchronization barrier! – Threads execute in parallel iff

Out-of-Order Parallel Simulation of SystemC Models on Many-Core Platforms

UNSW, 2 March 2016

(c) 2016 R. Doemer, CECS 22

Experimental Results

• Mandelbrot Renderer Example Graphics Pipeline Application, embarrassingly parallel

– Simulator run times onIntel® Many Integrated Core (MIC)Architecture

Intel® Xeon Phi™ coprocessor

• 5110P CPU at 1.052 GHz

• 60 cores, 4 way hyper-threaded

• Bidirectional ring interconnect, L2 chache

• Appears as regular Linux machine with 240 cores!

Experimental result:

• Traditional DES vs. synchronous PDES (no conflicts):

• Run time (seq/par): 226.57 sec / 5.50 sec

41x speedup!

Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 43

RISC Compiler and Simulator

Out-of-Order Parallel SystemC Compiler and Simulator

• Open Source Prototype Implementation– Alpha Release V0.2.1 published October 30, 2015

• http://www.cecs.uci.edu/~doemer/risc.html

Source tar ball: risc_v0.2.1.tar.gz

Installation: INSTALL, Makefile

Doxygen documentation: RISC API

Doxygen documentation: OoO Parallel SystemC API

BSD license terms: LICENSE

• Downloads and feedback welcome! Code hardening

Extension of supported parallel SystemC subset

Standardization…

Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 44

Page 23: Out-of-Order Parallel Simulation of SystemC Models …doemer/publications/Talk_UNSW_20160302.pdf2016/03/02  · Breaks synchronization barrier! – Threads execute in parallel iff

Out-of-Order Parallel Simulation of SystemC Models on Many-Core Platforms

UNSW, 2 March 2016

(c) 2016 R. Doemer, CECS 23

Concluding Remarks

• Project on Advanced Parallel SystemC Simulation– Out-of-Order PDES on many-core host platforms

– Maximum compliance with current execution semantics

• SystemC Compiler Integrated with Parallel Simulator– Segment Graph based static analysis for parallel execution

– Model instrumentation and protection of communication

– Out-of-order parallel scheduler, many-core platform support

• Open Source– RISC V0.2.1, working prototype implementation– Available at www.cecs.uci.edu/~doemer/risc.html

• Ongoing and Future Work– Code hardening and virtual platform integration (i.e. Simics®)

– Collaboration with Accellera SystemC Language WG

Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 45

References

• [CECS-TR-15-02] G. Liu, T. Schmidt, R. Dömer:"RISC Compiler and Simulator, Alpha Release V0.2.1: Out-of-Order Parallel SimulatableSystemC Subset", Center for Embedded and Cyber-physical Systems, CECS, April 2015.

• [ASPDAC’15] G. Liu, T. Schmidt, R. Dömer, A. Dingankar, D. Kirkpatrick:"Optimizing Thread-to-Core Mapping on Manycore Platforms with Distributed Tag Directories", Proceedings of ASPDAC, Tokyo, Japan, January 2015.

• [IEEE TCAD’14] W. Chen, X. Han, C. Chang, G. Liu, R. Dömer:"Out-of-Order Parallel Discrete Event Simulation for Transaction Level Models",IEEE Transactions on CAD, vol. 33, no. 12, pp. 1859-1872, December 2014.

• [DATE’14] W. Chen, X. Han, R. Dömer: "May-Happen-in-Parallel Analysis based on Segment Graphs for Safe ESL Models", Proceedings of DATE, Dresden, Germany, March 2014. (Best Paper Award!)

• [DATE’13] W. Chen, R. Dömer: "Optimized Out-of-Order Parallel Discrete Event Simulation Using Predictions", Proceedings of DATE, Grenoble, France, March 2013.

• [IEEE D&T’13] W. Chen, X. Han, C. Chang, R. Dömer: "Advances in Parallel Discrete Event Simulation for Electronic System-Level Design", IEEE Design & Test of Computers, vol. 30, no. 1, pp. 45-54, Jan.-Feb. 2013.

• [DATE’12] W. Chen, X. Han, R. Dömer: "Out-of-Order Parallel Simulation for ESL Design", Proceedings of DATE, Dresden, Germany, March 2012.

• [ASPDAC’12] W. Chen, R. Dömer: "An Optimizing Compiler for Out-of-Order Parallel ESL Simulation Exploiting Instance Isolation", Proceedings of ASPDAC, Sydney, Australia, February 2012.

Out-of-Order Parallel SystemC, UNSW, 2 March 2016 (c) 2016 R. Doemer, CECS 46