Models of Architecture
Transcript of Models of Architecture
Models of Architecture
Nokia Bell Labs 2018
Maxime Pelcat INSA Rennes, IETR, Institut Pascal
This work has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 732105: CERBERO.
• Abstracting computational architecture to –Predict performance
–Support current hardware evolutions
Models of Architecture
• Hardware Architectures are becoming –More complex
–More heterogeneous
–More High Performance embedded Computing (HPeC) • Embedded deep learning, near-sensor computing, fog
computing, edge computing, many-cores, etc.
• Real-time constraints, stream processing applications
Motivation: architecture evolution
• Let’s look at ARM-based HPeC –Let us consider 4 heterogeneous solutions
• ARM = control path + some of the data path
• in red: data path
Motivation: HPeC architectures
Multi-ARM FPGA Multi-ARM
GPGPU Multi-ARM DSP Multi-ARM
• Let’s look at ARM-based HPeC
Motivation: HPeC architectures
Multi-ARM FPGA Multi-ARM
GPGPU Multi-ARM DSP Multi-ARM
• ARM big.LITTLE: Samsung Exynos 5422
Motivation: HPeC architectures
A15 A15
A15 A15
SCU ACE
A7 A7
A7 A7
SCU
2MB 0.5MB
2GB DDR (PoP)
Easy to program Linux SMP Thread migration 12Gflops <10W
Low energy cores
High Performance
cores
2GHz 1.4GHz
• Let’s look at ARM-based HPeC
Motivation: HPeC architectures
Multi-ARM FPGA Multi-ARM
GPGPU Multi-ARM DSP Multi-ARM
• Multi-ARM + GPGPU: Nvidia Jetson TX1 module
Motivation: HPeC architectures
A57 A57
A57 A57
SCU 4GB
external DDR on module
Less easy to program Linux SMP + CUDA/OpenCL
32 cores /warp
Control path
1.6GHz
256-core Maxwell GPGPU
Data path
H.264 4K 60Hz
• Let’s look at ARM-based HPeC
Motivation: HPeC architectures
Multi-ARM FPGA Multi-ARM
GPGPU Multi-ARM DSP Multi-ARM
• Multi-ARM + DSP: Texas Instruments Keystone II TCI6638K2K Motivation: HPeC architectures
A15 A15
A15 A15
SCU 4MB
Difficult to program (well) Linux SMP + Open Event Machine 160 Gflops <15W
Control path
1.4GHz
Data path
Teranet
C66
1MB
C66
1MB
C66
1MB
C66
1MB
C66
1MB
C66
1MB
C66
1MB
C66
1MB
FFTC
6MB MSMC
1.2GHz
• Let’s look at ARM-based HPeC
Motivation: HPeC architectures
Multi-ARM FPGA Multi-ARM
GPGPU Multi-ARM DSP Multi-ARM
• Multi-ARM + FPGA: Xilinx Zynq Ultrascale + Motivation: HPeC architectures
A53 A53
A53 A53
SCU 1MB
More difficult to program (well) Linux SMP + HLS or HDL
Control path
1.5GHz
Data path GPU
FPGA
R5 R5
Switch fabric
Not GPGPU
Up to 4MB 1MFF 0.5MLUT
600MHz
Motivation: HPeC architectures
• Current trends – FPGAs are gaining importance: what about flops?
– Adding video/image accelerators • Video Compression: H.264/AVC, H.265/HEVC, etc.
• AI: For tensor applications reach 1Tops/W
–RISC-V as an open HW competitor to ARM
• Towards more complexity –More cores, hierarchies of clusters
–Heteronegeneity, Interconnect complexity
• Reminds intra-core modifications in XXth
Motivation: architecture evolution
ALU clk clk clk + + + + Ld × + Str
SIMD VLIW
• But there are some differences between intra-core and inter-core parallelism –At coarse grain, PEs communicate asynchronously – There is no (or less) centralized processing decision – There is no performance portability (nothing equivalent
to C-to-VLIW compilers)
• How can/should we manage this HW complexity? –Can we predict performance at design time? How?
Motivation: architecture evolution
System Objectives
Maxime Pelcat 17
T°C
Energy
Reliability
Memory
Unit Cost $
Security Maintenance Cost
$
Performance
Peak Power
System Prototype
System Design: Y-Chart
Maxime Pelcat 18
Architecture
Design
Algorithm Application
Redesign
Red
esign
Model of Architecture (MoA)
conform to
Model-Based Design
19
KPI
Architecture Model
KPI Evaluation
Algorithm Algorithm Model
Red
esign
Maxime Pelcat
Model of Computation(MoC)
conforms to
Red
esign
On MoC Side: Many Results • #EdwardALee, #ProgrammingParadigms • Discrete Event MoCs • Finite State Machines imperative languages • Functional MoCs • Petri Nets • Dataflow MoCs SDF, CSDF, IDF, IBSDF, PSDF,
SPDF, PiSDF, etc.
20 Maxime Pelcat
PREESM
And they are not all here… Dataflow MoCs Case
Feature SDF ADF IBSDF DSSF PSDF PiSDF SADF SPDF DPN KPN
Expressivity Low Med. Turing complete
Hierarchical X X X X
Compositional X X X
Reconfigurable X X X X X X
Statically schedulable X X X X
Decidable X X X X (X) (X) X (X)
Variable rates X X X X X X X
Non-determinism X X X
SDF: Synchronous Dataflow ADF: Affine Dataflow IBSDF: Interface-Based Dataflow DSSF: Deterministic SDF with Shared Fifos PSDF: Parameterized SDF
PiSDF Parameterized and Interfaced SDF SADF: Scenario-Aware Dataflow SPDF: Schedulable Parametric Dataflow DPN: Dataflow Process Network KPN: Kahn Process Network
But Still a Lot to Do • on Real-Time Multicore systems especially
• Usually, RT application specification = –Multiple tasks sharing resources
–Activation periods or triggering events
• Objective = keeping resources busy
22 Maxime Pelcat
T1 T2
T3
MoCs are not sufficient
23
Energy
Energy Evaluation
Algorithm Algorithm Model
Maxime Pelcat
Model of Computation(MoC)
conforms to
Models of Architecture
Maxime Pelcat 24
Model of Architecture (MoA)
conform to
KPI
Architecture Model
KPI Evaluation
Algorithm Algorithm Model
Red
esign
Red
esign
Problem: Predict System Quality • How to predict a system « quality » ? –Efficiently (simple procedure)
–Early (from abstract models)
–Accurately (with a good fidelity)
–With reproducibility (same models = same prediction)
25 Maxime Pelcat
Model of Architecture • Definition –Model of a system Non-Functional Property
–Application-independent
–Abstract
–Reproducible
26 Maxime Pelcat
Pelcat, M; Mercat, A; Desnos, K; Maggiani, L; Liu, Y; Heulot, J; Nezan, J-F; Hamidouche, W; Ménard, D; Bhattacharyya, S (2017) "Reproducible Evaluation of System Efficiency with a Model of Architecture: From Theory to Practice", IEEE TCAD.
Model of Architecture
Maxime Pelcat 27
Model Reproducible
Application-independent
Abstract
AADL
MCA SHIM
UML MARTE /
AAA
CHARMED
S-LAM
MAPS
LSLA
NFP = MoA( ) activity( )
MoA depends on MoC
Model of Architecture
28
One and always the same quality evaluation
Model H conforms to MoA Model G conforms to MoC
Activity
MoC( )
Maxime Pelcat
application Performance
Power
Energy Memory T°C
Reliability
Security Cost
LSLA: First MoA • LSLA = Linear System-Level Architecture
Model
• Motivated by the additive nature of energy consumption
Maxime Pelcat 30
System Objectives
Maxime Pelcat 31
T°C
Energy
Reliability
Memory
Unit Cost $
Security Maintenance Cost
$
Performance
Peak Power
Energy/Power Define Architecture
0 20W 20kW 20MW
Need a dissipator
2W 7W
Need a fan
Embedded system Dedicated system
or conventional system
HPC
HPeC
influence
LSLA Model of Architecture
33
Task1 signal
signal Task2 Task3
Task4
Task5
1
1
1
1
1
1
1
PE1 PE2 CN
10x+1
2x+0 3x+0
16+12+22=50
Maxime Pelcat
token
quantum
Compositional
LSLA Model of Architecture
34
Task1 signal
signal Task2 Task3
Task4
Task5
1
1
1
1
1
1
1
PE1 PE2 CN
10x+1
2x+0 3x+0
16+12+22=50
Maxime Pelcat
SDF: Model of Computation
Activity
LSLA: Model of Architecture
LSLA MoA for Energy Prediction • The model is learnt from energy
measurements
36
PE PE
CN
PE PE
PE PE
CN
PE PE
CN
Maxime Pelcat
LSLA MoA for Energy Prediction • The model is learnt from energy
measurements
37
PE PE
CN
α 1.5W 1.5W
PE PE 1.5W 1.5W
PE PE
CN
γ
0.3W 0.3W
PE PE 0.3W 0.3w
CN
β
Maxime Pelcat
LSLA: MoA, not MoHW • LSLA models HW + communication
libraries + scheduler + Oss +…
• LSLA models the service the platform offers to the applications
• Top-down approach –Learning parameters from experiments
Maxime Pelcat 38
System Objectives
Maxime Pelcat 39
T°C
Energy
Reliability
Memory
Unit Cost $
Security Maintenance Cost
$
Latency
Peak Power
MoAs: Limits of LSLA • Energy Linear model OK • Latency
• Latency does not have an additive nature
40
Maxime Pelcat
Task1 Task2 1
1
1
Task1
Task2
1
1
1
1
Latency = sum
Latency = max
!
Activity & MoA for Latency
41
Task1 signal
signal Task2 Task3
Task4
Task5
1
1
1
1
1
1
1
SDF
a)
b)
Maxime Pelcat
c)
Activity & MoA for Latency
42
PE1 PE2 CN
10x+1
2x+0 3x+0
Σ 12+12+11=35
Σ 8+6+11=25
max(35,25)=35
a)
b)
Maxime Pelcat MaxPlus
System Prototype
Accuracy? No, Fidelity!
Maxime Pelcat 44
Architecture
Design
Algorithm Application
Redesign
Red
esign
H2020 Cerbero Toolchain
Maxime Pelcat 48
VT
AOW DynAA
PAPI
SPIDER JADE
ARTICO3 MDC
Intermediate Representation
C++
System Model
Application / Architecture
Runtime Support
Low-Level Implementation (Hardware Abstraction)
PREESM
MECA End-user interaction
GdR SOC2 • Groupement de recherche SOC2 – Systems on a Chip, Connected Systems
– Industrial partner club
Maxime Pelcat 49
Takeaway Message • MoAs can early predict performance/quality –Especially for HPeC systems
• MoAs are not HW Models – They model HW + protocols + OS + …
• MoAs are built/learnt top-down – They can and should be simple
• The need for MoAs may rise –Due to Fog/Edge Computing complexity
Maxime Pelcat 52
KPI
MoA MoC
Act
Questions?
Maxime Pelcat 53
Pelcat, M; Mercat, A; Desnos, K; Maggiani, L; Liu, Y; Heulot, J; Nezan, J-F; Hamidouche, W; Ménard, D; Bhattacharyya, S (2017) "Reproducible Evaluation of System Efficiency with a Model of Architecture: From Theory to Practice", IEEE TCAD.
www.cerbero-h2020.eu
http://preesm.org