How Model-Based Design Simplifies the Debugging of Many ... · Iuliana Bacivarov Computer...

24
Iuliana Bacivarov Computer Engineering and Networks Laboratory, ETH Zürich 1st International Workshop on Multicore Application Debugging (MAD) 2013, 14-15 November 2013, München, Germany How Model-Based Design Simplifies the Debugging of Many-Core Systems

Transcript of How Model-Based Design Simplifies the Debugging of Many ... · Iuliana Bacivarov Computer...

Page 1: How Model-Based Design Simplifies the Debugging of Many ... · Iuliana Bacivarov Computer Engineering and Networks Laboratory, ETH Zürich 1st International Workshop on Multicore

| |

Iuliana Bacivarov

Computer Engineering and Networks Laboratory, ETH Zürich

1st International Workshop on Multicore Application Debugging

(MAD) 2013, 14-15 November 2013, München, Germany

How Model-Based Design Simplifies

the Debugging of Many-Core Systems

Page 2: How Model-Based Design Simplifies the Debugging of Many ... · Iuliana Bacivarov Computer Engineering and Networks Laboratory, ETH Zürich 1st International Workshop on Multicore

| |

team

Devesh Chokshi, Wolfgang Haid, Kai Huang, Shin-Haeng

Kang, Pratyush Kumar, Devendra Rai, Lars Schor, Hoeseok

Yang, Prof. Lothar Thiele

projects

EU-SHAPES, EU-PREDATOR, EU-COMBEST, EU-

ARTISTDESIGN, EU-PRO3D, EU-EURETILE, nano-tera

Extreme, nano-tera UltrasoundToGo

11/15/2013 Iuliana Bacivarov, Computer Engineering Group, ETH Zurich 2

Acknowledgements

Intel SCC (Single-chip

Cloud Computer )

Page 3: How Model-Based Design Simplifies the Debugging of Many ... · Iuliana Bacivarov Computer Engineering and Networks Laboratory, ETH Zürich 1st International Workshop on Multicore

| | 11/15/2013 Iuliana Bacivarov, Computer Engineering Group, ETH Zurich 3

Current Embedded Systems are Complex

Intel SCC

(48 cores)

Intel Xeon Phi

(64 cores)

parallel applications

many-tile/many-core hardware

dynamic workloads

performance,

real-time,

power,

and temperature high-

temperature

fault

dynamic mapping

Page 4: How Model-Based Design Simplifies the Debugging of Many ... · Iuliana Bacivarov Computer Engineering and Networks Laboratory, ETH Zürich 1st International Workshop on Multicore

| |

11/15/2013 Iuliana Bacivarov, Computer Engineering Group, ETH Zurich 4

Debugging is Hard!

Page 5: How Model-Based Design Simplifies the Debugging of Many ... · Iuliana Bacivarov Computer Engineering and Networks Laboratory, ETH Zürich 1st International Workshop on Multicore

| |

“Debugging is a methodical process of finding and reducing the number of bugs, or defects, in a computer program or a piece of electronic hardware, thus making it behave as expected.”

---- Wikipedia

“Debugging tends to be harder when various subsystems are tightly coupled, as changes in one may cause bugs to emerge in another.”

---- Wikipedia

11/15/2013 Iuliana Bacivarov, Computer Engineering Group, ETH Zurich 5

Debugging

Page 6: How Model-Based Design Simplifies the Debugging of Many ... · Iuliana Bacivarov Computer Engineering and Networks Laboratory, ETH Zürich 1st International Workshop on Multicore

| |

11/15/2013 Iuliana Bacivarov, Computer Engineering Group, ETH Zurich 6

Problems with Parallel Programming

Im

age “

bor

row

ed” f

rom

an

Iom

ega

adve

rtis

em

ent

for

Y2

K

soft

war

e a

nd d

isk

dri

ves,

Sci

enti

fic

Am

eric

an, S

ept

em

ber

199

9.

Ed Lee, The Future of Embedded Software, 2006

http://ptolemy.eecs.berkeley.edu/presentations/06/

What it Feels Like to Use the

synchronized Keyword in Java

Page 7: How Model-Based Design Simplifies the Debugging of Many ... · Iuliana Bacivarov Computer Engineering and Networks Laboratory, ETH Zürich 1st International Workshop on Multicore

| | 11/15/2013 Iuliana Bacivarov, Computer Engineering Group, ETH Zurich 7

Problems with Parallel Programming

Ed Lee, The Future of Embedded Software, 2006

http://ptolemy.eecs.berkeley.edu/presentations/06/

Threads are wildly nondeterministic

The programmer’s job is to prune away the non-determinism by

imposing constraints on execution order (e.g., mutexes)

Nontrivial software written with threads, semaphores, and

mutexes is incomprehensible to humans

… and doesn’t deliver a rigorous, analyzable, and

understandable model of concurrency.

“Humans are quickly overwhelmed by concurrency and find it much more difficult to reason about concurrent than sequential code. Even careful people miss possible interleavings among even simple collections of partially ordered operations.” H. Sutter and J. Larus. Software and the concurrency revolution. ACM Queue, 3(7), 2005.

Page 8: How Model-Based Design Simplifies the Debugging of Many ... · Iuliana Bacivarov Computer Engineering and Networks Laboratory, ETH Zürich 1st International Workshop on Multicore

| | 11/15/2013 Iuliana Bacivarov, Computer Engineering Group, ETH Zurich 8

Key Concepts in Model-Based Design

Models are composed to form designs.

Models evolve during design.

Specifications are executable models.

Deployed code is generated from models.

Modeling languages have formal semantics.

Modeling languages themselves are modeled.

For general-purpose software, this is about Object-oriented design

For embedded systems, this is about Time

Concurrency

Ed Lee, The Future of Embedded Software, 2006

http://ptolemy.eecs.berkeley.edu/presentations/06/

Page 9: How Model-Based Design Simplifies the Debugging of Many ... · Iuliana Bacivarov Computer Engineering and Networks Laboratory, ETH Zürich 1st International Workshop on Multicore

| | 11/15/2013 Iuliana Bacivarov, Computer Engineering Group, ETH Zurich 9

The Good News

Model-Based Design

enables a

‘correct by design’ execution

execution

Tile Tile Tile Tile Tile Tile

Tile Tile Tile Tile Tile Tile

Mem

ory

Cntr

.

Mem

ory

Cntr

.

Tile Tile Tile Tile Tile Tile

Tile Tile Tile Tile Tile TileR R R R R R M

em

ory

Cntr

.

Mem

ory

Cntr

.

R R R R R R

R R R R R R

R R R R R R

p1 p2 p3

Page 10: How Model-Based Design Simplifies the Debugging of Many ... · Iuliana Bacivarov Computer Engineering and Networks Laboratory, ETH Zürich 1st International Workshop on Multicore

| |

The Good News

11/15/2013 Iuliana Bacivarov, Computer Engineering Group, ETH Zurich 10

application architecture

design space

exploration analysis

mapping

software

synthesis

execution

functional

simulation

Tile Tile Tile Tile Tile Tile

Tile Tile Tile Tile Tile Tile

Mem

ory

Cntr

.

Mem

ory

Cntr

.

Tile Tile Tile Tile Tile Tile

Tile Tile Tile Tile Tile TileR R R R R R M

em

ory

Cntr

.

Mem

ory

Cntr

.

R R R R R R

R R R R R R

R R R R R R

p1 p2 p3

Distributed Application Layer:

model-based design &

separation of concerns

Page 11: How Model-Based Design Simplifies the Debugging of Many ... · Iuliana Bacivarov Computer Engineering and Networks Laboratory, ETH Zürich 1st International Workshop on Multicore

| |

Proposed by Kahn in 1974 as a general-purpose scheme for parallel programming READ: destructive and blocking

WRITE: non-blocking

FIFO: infinite size

Unique attribute: determinate

Deterministic model of computation Focus on causality, not order (implementation independent)

Functional behavior is independent of timing (execution time, communication time, scheduling)

Data-driven scheduling: processes run whenever they are ready

11/15/2013 Iuliana Bacivarov, Computer Engineering Group, ETH Zurich 11

Application Specification: Kahn Process

Network p1 p2 p3

Page 12: How Model-Based Design Simplifies the Debugging of Many ... · Iuliana Bacivarov Computer Engineering and Networks Laboratory, ETH Zürich 1st International Workshop on Multicore

| | 11/15/2013 Iuliana Bacivarov, Computer Engineering Group, ETH Zurich 12

Application Specification: MPEG2 KPN

Kahn process network

Unique attribute:

determinate

TG

MERGE

DEMUX

IQ ZZ iDCT

LIBU

Page 13: How Model-Based Design Simplifies the Debugging of Many ... · Iuliana Bacivarov Computer Engineering and Networks Laboratory, ETH Zürich 1st International Workshop on Multicore

| | 11/15/2013 Iuliana Bacivarov, Computer Engineering Group, ETH Zurich 13

Execution Scenarios Specification

Application / run-time

environment can request a

scenario change

stand-by

music

video

phone

and

music

phone

and

video

phone

R: phone R: -

R: MP3

R: MPEG-2,

AAC

R: phone

H: MP3

R: phone, MPEG-2

H: AAC

Each application can:

START

STOP

PAUSE

RESUME

TG

MERGE

DEMUX

IQ ZZ iDCT

LIBU

Page 14: How Model-Based Design Simplifies the Debugging of Many ... · Iuliana Bacivarov Computer Engineering and Networks Laboratory, ETH Zürich 1st International Workshop on Multicore

| | 11/15/2013 Iuliana Bacivarov, Computer Engineering Group, ETH Zurich 14

Architecture Specification

Hierarchical architecture

Tile Tile Tile Tile Tile Tile

Tile Tile Tile Tile Tile Tile

Mem

ory

Cntr

.

Mem

ory

Cntr

.

Tile Tile Tile Tile Tile Tile

Tile Tile Tile Tile Tile Tile R R R R R R M

em

ory

Cntr

.

Mem

ory

Cntr

.

R R R R R R

R R R R R R

R R R R R R

e.g., Intel SCC

Page 15: How Model-Based Design Simplifies the Debugging of Many ... · Iuliana Bacivarov Computer Engineering and Networks Laboratory, ETH Zürich 1st International Workshop on Multicore

| | 11/15/2013 Iuliana Bacivarov, Computer Engineering Group, ETH Zurich 15

Application-to-Architecture Mapping

c1 c2

c3 c4

c1 c2

c3 c4

scenario1

scenario2

c1 c2

c3 c4

c1 c2

c3 c4

scenario1

scenario2

c1 c2

c3 c4

c1 c2

c3 c4

scenario1

scenario2

scenario1 scenario2

Page 16: How Model-Based Design Simplifies the Debugging of Many ... · Iuliana Bacivarov Computer Engineering and Networks Laboratory, ETH Zürich 1st International Workshop on Multicore

| | 11/15/2013 Iuliana Bacivarov, Computer Engineering Group, ETH Zurich 16

Hierarchical Mapping Optimization

– via Problem Decomposition

scenario1

scenario4

scenario2

scenario3

e1

e2

e3e4

e5

e6

e7 e8

P2:

running

P1:

running

P1:

paused

P2:

running

P1:

running

P3:

running

state-based

decomposition

architecture-based

decomposition

[ref] S. Kang, H. Yang, L. Schor, I. Bacivarov, S. Ha and L. Thiele, Multi-Objective Mapping Optimization via Problem

Decomposition for Many-Core Systems, ESTIMedia, Tampere, Finland, Oct. 2012

[ref] L. Schor, I. Bacivarov, D. Rai, H. Yang, S. Kang and L. Thiele, Scenario-Based Design Flow for Mapping Streaming

Applications onto On-Chip Many-Core Systems, CASES, Tampere, Finland, Oct. 2012

Page 17: How Model-Based Design Simplifies the Debugging of Many ... · Iuliana Bacivarov Computer Engineering and Networks Laboratory, ETH Zürich 1st International Workshop on Multicore

| |

From Specification to Analysis and Simulations

11/15/2013 Iuliana Bacivarov, Computer Engineering Group, ETH Zurich 17

automatic generation of

different system ‘views’

analysis

functional simulation

cycle-/instruction-accurate

simulation

execution on hardware

functional simulation simulation/execution

core 1

Linux kernel

multi-processing

v1 v3

interconnect

core 2

Linux kernel

multi-processing

v4 v2

MPA analysis model

[ref] K. Huang, W. Haid, I. Bacivarov, M. Keller, and L. Thiele. Embedding Formal Performance Analysis into the Design

Cycle of MPSoCs for Real-time Multimedia Applications. ACM TECS, Vol. 11, No. 1, pages 8:1-8:23, March, 2012.

[ref] L. Schor, I. Bacivarov, D. Rai, H. Yang, S. Kang and L. Thiele, Scenario-Based Design Flow for Mapping Streaming

Applications onto On-Chip Many-Core Systems, CASES, Tampere, Finland, Oct. 2012

system specification

Page 18: How Model-Based Design Simplifies the Debugging of Many ... · Iuliana Bacivarov Computer Engineering and Networks Laboratory, ETH Zürich 1st International Workshop on Multicore

| |

provides an implementation of the programming interface

inter-process communication (distributed memory)

multi-processing mechanisms

services to manage processes and channels at runtime

11/15/2013 Iuliana Bacivarov, Computer Engineering Group, ETH Zurich 18

Runtime System

core 1

Linux kernel

multi-processing

producer consumer

network-on-chip

core 2

Linux kernel

multi-processing

worker A worker B

[ref] L. Schor, D. Rai, H. Yang, I. Bacivarov, and L. Thiele, Reliable and Efficient Execution of Multiple Streaming

Applications on Intel's SCC Processor. Runtime and Operating Systems for the Many-core Era (ROME) August 2013.

Page 19: How Model-Based Design Simplifies the Debugging of Many ... · Iuliana Bacivarov Computer Engineering and Networks Laboratory, ETH Zürich 1st International Workshop on Multicore

| |

shared vs. distributed memory

on Intel SCC, RCKMPI lib. for inter-core communication

one listener thread per core for all incoming traffic

virtual buffer at sender to limit traffic

11/15/2013 Iuliana Bacivarov, Computer Engineering Group, ETH Zurich 19

Inter-Process Communication

memory 1

core 1

producer worker

network-on-chip

memory 2

core 2

LISTENER consumer

RCKMPI

Page 20: How Model-Based Design Simplifies the Debugging of Many ... · Iuliana Bacivarov Computer Engineering and Networks Laboratory, ETH Zürich 1st International Workshop on Multicore

| |

on top of Linux kernel – processes mapped onto POSIX

threads

data-driven execution – no global scheduler required

11/15/2013 Iuliana Bacivarov, Computer Engineering Group, ETH Zurich 20

Multi Processing

core 1

Linux kernel

POSIX environment

POSIX thread POSIX thread

producer consumer

void *producer_thread

(void *arg) {

Process *p = (Process*) arg;

while (!p->stopped) {

p->fire();

}

}

Page 21: How Model-Based Design Simplifies the Debugging of Many ... · Iuliana Bacivarov Computer Engineering and Networks Laboratory, ETH Zürich 1st International Workshop on Multicore

| |

specified as a process network

one master process: manages dynamic execution

one slave process per core: manage processes and channels

11/15/2013 Iuliana Bacivarov, Computer Engineering Group, ETH Zurich 21

Runtime Manager

core 1

network-on-chip

core 2 core 3

M

S

S

producer consumer

Z Z Z Z Z Z

Z Z Z

1. install processes

2. create FIFO(s)

3. start processes

Page 22: How Model-Based Design Simplifies the Debugging of Many ... · Iuliana Bacivarov Computer Engineering and Networks Laboratory, ETH Zürich 1st International Workshop on Multicore

| | 11/15/2013 Iuliana Bacivarov, Computer Engineering Group, ETH Zurich 22

Synthesis Backend

target platforms

functional simulation on Linux

multi-cluster system:

each Linux server forms one cluster with multiple cores

Inter-cluster communication with MPI

Intel SCC

QUonG platform (INFN)

3

21A B C

mapping optimization

runtime-manager synthesis

process network synthesis

fire(){

read(...);

...}

Process A --> core 1

Process B --> core 2

Process C --> core 2

MS

S

main(for each core)

Makefile process wrappers

DNP

RISC

DSP

MEM

***

***

Intel SCC (Single-chip

Cloud Computer )

APEnet+

Page 23: How Model-Based Design Simplifies the Debugging of Many ... · Iuliana Bacivarov Computer Engineering and Networks Laboratory, ETH Zürich 1st International Workshop on Multicore

| | 11/15/2013 Iuliana Bacivarov, Computer Engineering Group, ETH Zurich 23

Deployment

DAL is available:

www.tik.ee.ethz.ch/~euretile/dal.php

Page 24: How Model-Based Design Simplifies the Debugging of Many ... · Iuliana Bacivarov Computer Engineering and Networks Laboratory, ETH Zürich 1st International Workshop on Multicore

| | 11/15/2013 Iuliana Bacivarov, Computer Engineering Group, ETH Zurich 24

predictability

safety, dynamism

3

21

safe execution

execution, scalability

DNP

RISC

DSP

MEM

***

***

Intel SCC (Single-chip

Cloud Computer )

APEnet+

complete design flow

easy debugging

core 1

Linux kernel

multi-processing

p1 p2 p3

optimality

coverage of A

A1

A2A0

coverage of BB1

B0 B2

processor fitness

clu

ste

r fitn

ess

p1 p2 p3

KPN - deterministic MoC