Models of Architecture

53
Models of Architecture Nokia Bell Labs 2018 Maxime Pelcat INSA Rennes, IETR, Institut Pascal This work has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 732105: CERBERO.

Transcript of Models of Architecture

Models of Architecture

Nokia Bell Labs 2018

Maxime Pelcat INSA Rennes, IETR, Institut Pascal

This work has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 732105: CERBERO.

INSA Rennes – IETR VAADER • INSA Rennes

• IETR VAADER

• Institut Pascal

2

• Abstracting computational architecture to –Predict performance

–Support current hardware evolutions

Models of Architecture

• Hardware Architectures are becoming –More complex

–More heterogeneous

–More High Performance embedded Computing (HPeC) • Embedded deep learning, near-sensor computing, fog

computing, edge computing, many-cores, etc.

• Real-time constraints, stream processing applications

Motivation: architecture evolution

• Let’s look at ARM-based HPeC –Let us consider 4 heterogeneous solutions

• ARM = control path + some of the data path

• in red: data path

Motivation: HPeC architectures

Multi-ARM FPGA Multi-ARM

GPGPU Multi-ARM DSP Multi-ARM

• Let’s look at ARM-based HPeC

Motivation: HPeC architectures

Multi-ARM FPGA Multi-ARM

GPGPU Multi-ARM DSP Multi-ARM

• ARM big.LITTLE: Samsung Exynos 5422

Motivation: HPeC architectures

A15 A15

A15 A15

SCU ACE

A7 A7

A7 A7

SCU

2MB 0.5MB

2GB DDR (PoP)

Easy to program Linux SMP Thread migration 12Gflops <10W

Low energy cores

High Performance

cores

2GHz 1.4GHz

• Let’s look at ARM-based HPeC

Motivation: HPeC architectures

Multi-ARM FPGA Multi-ARM

GPGPU Multi-ARM DSP Multi-ARM

• Multi-ARM + GPGPU: Nvidia Jetson TX1 module

Motivation: HPeC architectures

A57 A57

A57 A57

SCU 4GB

external DDR on module

Less easy to program Linux SMP + CUDA/OpenCL

32 cores /warp

Control path

1.6GHz

256-core Maxwell GPGPU

Data path

H.264 4K 60Hz

• Let’s look at ARM-based HPeC

Motivation: HPeC architectures

Multi-ARM FPGA Multi-ARM

GPGPU Multi-ARM DSP Multi-ARM

• Multi-ARM + DSP: Texas Instruments Keystone II TCI6638K2K Motivation: HPeC architectures

A15 A15

A15 A15

SCU 4MB

Difficult to program (well) Linux SMP + Open Event Machine 160 Gflops <15W

Control path

1.4GHz

Data path

Teranet

C66

1MB

C66

1MB

C66

1MB

C66

1MB

C66

1MB

C66

1MB

C66

1MB

C66

1MB

FFTC

6MB MSMC

1.2GHz

• Let’s look at ARM-based HPeC

Motivation: HPeC architectures

Multi-ARM FPGA Multi-ARM

GPGPU Multi-ARM DSP Multi-ARM

• Multi-ARM + FPGA: Xilinx Zynq Ultrascale + Motivation: HPeC architectures

A53 A53

A53 A53

SCU 1MB

More difficult to program (well) Linux SMP + HLS or HDL

Control path

1.5GHz

Data path GPU

FPGA

R5 R5

Switch fabric

Not GPGPU

Up to 4MB 1MFF 0.5MLUT

600MHz

Motivation: HPeC architectures

• Current trends – FPGAs are gaining importance: what about flops?

– Adding video/image accelerators • Video Compression: H.264/AVC, H.265/HEVC, etc.

• AI: For tensor applications reach 1Tops/W

–RISC-V as an open HW competitor to ARM

• Towards more complexity –More cores, hierarchies of clusters

–Heteronegeneity, Interconnect complexity

• Reminds intra-core modifications in XXth

Motivation: architecture evolution

ALU clk clk clk + + + + Ld × + Str

SIMD VLIW

• But there are some differences between intra-core and inter-core parallelism –At coarse grain, PEs communicate asynchronously – There is no (or less) centralized processing decision – There is no performance portability (nothing equivalent

to C-to-VLIW compilers)

• How can/should we manage this HW complexity? –Can we predict performance at design time? How?

Motivation: architecture evolution

System Objectives

Maxime Pelcat 17

T°C

Energy

Reliability

Memory

Unit Cost $

Security Maintenance Cost

$

Performance

Peak Power

System Prototype

System Design: Y-Chart

Maxime Pelcat 18

Architecture

Design

Algorithm Application

Redesign

Red

esign

Model of Architecture (MoA)

conform to

Model-Based Design

19

KPI

Architecture Model

KPI Evaluation

Algorithm Algorithm Model

Red

esign

Maxime Pelcat

Model of Computation(MoC)

conforms to

Red

esign

On MoC Side: Many Results • #EdwardALee, #ProgrammingParadigms • Discrete Event MoCs • Finite State Machines imperative languages • Functional MoCs • Petri Nets • Dataflow MoCs SDF, CSDF, IDF, IBSDF, PSDF,

SPDF, PiSDF, etc.

20 Maxime Pelcat

PREESM

And they are not all here… Dataflow MoCs Case

Feature SDF ADF IBSDF DSSF PSDF PiSDF SADF SPDF DPN KPN

Expressivity Low Med. Turing complete

Hierarchical X X X X

Compositional X X X

Reconfigurable X X X X X X

Statically schedulable X X X X

Decidable X X X X (X) (X) X (X)

Variable rates X X X X X X X

Non-determinism X X X

SDF: Synchronous Dataflow ADF: Affine Dataflow IBSDF: Interface-Based Dataflow DSSF: Deterministic SDF with Shared Fifos PSDF: Parameterized SDF

PiSDF Parameterized and Interfaced SDF SADF: Scenario-Aware Dataflow SPDF: Schedulable Parametric Dataflow DPN: Dataflow Process Network KPN: Kahn Process Network

But Still a Lot to Do • on Real-Time Multicore systems especially

• Usually, RT application specification = –Multiple tasks sharing resources

–Activation periods or triggering events

• Objective = keeping resources busy

22 Maxime Pelcat

T1 T2

T3

MoCs are not sufficient

23

Energy

Energy Evaluation

Algorithm Algorithm Model

Maxime Pelcat

Model of Computation(MoC)

conforms to

Models of Architecture

Maxime Pelcat 24

Model of Architecture (MoA)

conform to

KPI

Architecture Model

KPI Evaluation

Algorithm Algorithm Model

Red

esign

Red

esign

Problem: Predict System Quality • How to predict a system « quality » ? –Efficiently (simple procedure)

–Early (from abstract models)

–Accurately (with a good fidelity)

–With reproducibility (same models = same prediction)

25 Maxime Pelcat

Model of Architecture • Definition –Model of a system Non-Functional Property

–Application-independent

–Abstract

–Reproducible

26 Maxime Pelcat

Pelcat, M; Mercat, A; Desnos, K; Maggiani, L; Liu, Y; Heulot, J; Nezan, J-F; Hamidouche, W; Ménard, D; Bhattacharyya, S (2017) "Reproducible Evaluation of System Efficiency with a Model of Architecture: From Theory to Practice", IEEE TCAD.

Model of Architecture

Maxime Pelcat 27

Model Reproducible

Application-independent

Abstract

AADL

MCA SHIM

UML MARTE /

AAA

CHARMED

S-LAM

MAPS

LSLA

NFP = MoA( ) activity( )

MoA depends on MoC

Model of Architecture

28

One and always the same quality evaluation

Model H conforms to MoA Model G conforms to MoC

Activity

MoC( )

Maxime Pelcat

application Performance

Power

Energy Memory T°C

Reliability

Security Cost

Model of Architecture

29

KPI

MoA MoC

Act

Maxime Pelcat

LSLA: First MoA • LSLA = Linear System-Level Architecture

Model

• Motivated by the additive nature of energy consumption

Maxime Pelcat 30

System Objectives

Maxime Pelcat 31

T°C

Energy

Reliability

Memory

Unit Cost $

Security Maintenance Cost

$

Performance

Peak Power

Energy/Power Define Architecture

0 20W 20kW 20MW

Need a dissipator

2W 7W

Need a fan

Embedded system Dedicated system

or conventional system

HPC

HPeC

influence

LSLA Model of Architecture

33

Task1 signal

signal Task2 Task3

Task4

Task5

1

1

1

1

1

1

1

PE1 PE2 CN

10x+1

2x+0 3x+0

16+12+22=50

Maxime Pelcat

token

quantum

Compositional

LSLA Model of Architecture

34

Task1 signal

signal Task2 Task3

Task4

Task5

1

1

1

1

1

1

1

PE1 PE2 CN

10x+1

2x+0 3x+0

16+12+22=50

Maxime Pelcat

SDF: Model of Computation

Activity

LSLA: Model of Architecture

LSLA MoA for Energy Prediction • 86% of fidelity on octo-core ARM

35 Maxime Pelcat

LSLA MoA for Energy Prediction • The model is learnt from energy

measurements

36

PE PE

CN

PE PE

PE PE

CN

PE PE

CN

Maxime Pelcat

LSLA MoA for Energy Prediction • The model is learnt from energy

measurements

37

PE PE

CN

α 1.5W 1.5W

PE PE 1.5W 1.5W

PE PE

CN

γ

0.3W 0.3W

PE PE 0.3W 0.3w

CN

β

Maxime Pelcat

LSLA: MoA, not MoHW • LSLA models HW + communication

libraries + scheduler + Oss +…

• LSLA models the service the platform offers to the applications

• Top-down approach –Learning parameters from experiments

Maxime Pelcat 38

System Objectives

Maxime Pelcat 39

T°C

Energy

Reliability

Memory

Unit Cost $

Security Maintenance Cost

$

Latency

Peak Power

MoAs: Limits of LSLA • Energy Linear model OK • Latency

• Latency does not have an additive nature

40

Maxime Pelcat

Task1 Task2 1

1

1

Task1

Task2

1

1

1

1

Latency = sum

Latency = max

!

Activity & MoA for Latency

41

Task1 signal

signal Task2 Task3

Task4

Task5

1

1

1

1

1

1

1

SDF

a)

b)

Maxime Pelcat

c)

Activity & MoA for Latency

42

PE1 PE2 CN

10x+1

2x+0 3x+0

Σ 12+12+11=35

Σ 8+6+11=25

max(35,25)=35

a)

b)

Maxime Pelcat MaxPlus

c)

Activity & MoA for Latency

43

PE1 PE2 CN

10x+1

2x+0 3x+0

Σ 24

Maxime Pelcat

System Prototype

Accuracy? No, Fidelity!

Maxime Pelcat 44

Architecture

Design

Algorithm Application

Redesign

Red

esign

Current Activities

Maxime Pelcat 45

• Cross-layer Design of Reconfigurable Cyber-Physical Systems

H2020 CERBERO

Maxime Pelcat 46

CERBERO System Adaptation

Maxime Pelcat 47

H2020 Cerbero Toolchain

Maxime Pelcat 48

VT

AOW DynAA

PAPI

SPIDER JADE

ARTICO3 MDC

Intermediate Representation

C++

System Model

Application / Architecture

Runtime Support

Low-Level Implementation (Hardware Abstraction)

PREESM

MECA End-user interaction

GdR SOC2 • Groupement de recherche SOC2 – Systems on a Chip, Connected Systems

– Industrial partner club

Maxime Pelcat 49

GdR SOC2

Maxime Pelcat 50

SAMOS XIX • 19th edition of SAMOS Conference

• July 7-11, submission in March

Maxime Pelcat 51

Takeaway Message • MoAs can early predict performance/quality –Especially for HPeC systems

• MoAs are not HW Models – They model HW + protocols + OS + …

• MoAs are built/learnt top-down – They can and should be simple

• The need for MoAs may rise –Due to Fog/Edge Computing complexity

Maxime Pelcat 52

KPI

MoA MoC

Act

Questions?

Maxime Pelcat 53

Pelcat, M; Mercat, A; Desnos, K; Maggiani, L; Liu, Y; Heulot, J; Nezan, J-F; Hamidouche, W; Ménard, D; Bhattacharyya, S (2017) "Reproducible Evaluation of System Efficiency with a Model of Architecture: From Theory to Practice", IEEE TCAD.

www.cerbero-h2020.eu

http://preesm.org