From CPS to High-end computing: common problems and...
Transcript of From CPS to High-end computing: common problems and...
From CPS to High-end computing:
common problems and synergies
●Overview of the research competencies and achievements
●1st Workshop on Embedded Systems
●Pisa, September 20, 2016
●Contact: prof. William Fornaciari
●+39-02-2399.3504
●[email protected], home.deib.polimi.it/fornacia
In a nutshell: Keywords
Keywords
● System-level low power design
● Software energy optimization
● Real-time operating Systems
● Multi-many core architectures
● Power, Thermal, Energy
management
● Reliability, robustness
● Networks on Chip (NoC)
● Design Space exploration
● Scheduling for soft real time on
multi-many cores
● Mapping application onto parallel
architectures
● Run-time resource management
● Design flows and co-simulation
● Compilers, programming paradigms
● Wireless sensor networks
● Privacy, security
● Adaptive systems
In a nutshell: Projects
EU-funded Projects
● H2020 (Kick-off 2015-2016)
● MANGO, FET (HPC architectures and RTRM)
● M2DC (High-end embedded applications, security and RTRM)
● Safecop, ECSEL (Embedded systems)
● ANTAREX, FET (HPC programming models and RTRM)
● Current FP7 (kick off sept-oct 2013)
● HARPA (Thermal reliability, Runtime Mgmt), Project Coordinator
● CONTREX (Embedded Systems, including WSNs and distributed ES)
● Past FP7
P3S (EIT, cyberphysical systems)
2PARMA (RTRM, OpenCL, DSE) , POLIMI Coord. RANKED AS
“SUCCESS STORY” BY THE EU
COMPLEX (Embedded Systems, including WSNs), WPL
SMECY (Embedded computing), TL
MULTICUBE (DSE on multi/many cores), POLIMI Coord
OpenMediaPlatform (embedded virtual machines), POLIMI Coord
● Past (before FP7)
WASP (WSNs), PEOPLE (sw power estimation) –WPL, POET (sw
power optimization) – WPL, SEED (Hw/Sw Codesign), coordinator
In a nutshell: People
Main scientists
● 5 associate professors (..quite senior…)
● 6 assistant professors, 4 post-doc
● 10 PhD students, dozen of Master students
Computer Continuum: synergies and overlaps
Layers
Apps
Problems & Solutions Outputs & Tools
Many
cores, HPC
Thermal control for ageing and reliability
Run/time load balancing
Optimization of non functional aspects
Application mapping
Power/energy coarse grain monitoring
and control
Tip/Top patent filed in 2016 for thermal control (rack level)
BarbequeRTRM HPC extension (open source + commercial
customizations)
OpenCL backend, OpenMP, MPI, …
Compilers, DSE tools
Multi-cores,
Heterog.
Computing
High-End
ES
Load distribution on heterogeneous
cores
power/energy fine grain control
Design of accelerators
Reliability issues
Tip-Top thermal control (firmware)
BarbequeRTRM for several commercial boards (Odroid, x86,
Zynq, Panda, …)
NoC, Simulation toolchain (HANDS), Memory interface
optimization
DVFS exploitation
Compilers, DSE tools
Low-end
embedded
systems
Energy optimization
Size, cost, multi-sensor bords, small
footprint OSs
DVFS exploitation
Low level run-time optimization of energy and performance
Application specific design of software and firmware
Development of analysis toolsuite
Power attack - countermeasures
Wearable
CPS, IoT
Design of ultra-low power boards with
sensors, feature extraction, security and
privacy
WSN clock synchronization
Methodology for clock synch in WSNs
Development of platforms for wearable apps
Use of georef sources of information and GPRS
Miosix open source OS
Privacy and security protocols
Chip Thermal modeling
NoC design and optimization
Sensor & Knobs
Tip-Top hw for thermal control
NoC power aware design
Simulation toolchain (HANDS)
● Management of the node to deal with not purely functional
requirements
● Prototype running software capable to modify the node behavior
(mix of compile and runtime selection/loading of functions)
● WandStem: design of a wireless node (bare HW+ in-house OS
layer) with the concept of task hibernation
● Application and system software energy/performance
estimation and optimization
● Energy scavenging
● Lot of experience gained in two wide collaborative projects, good
links to extend the coverage of topics
Wireless Sensor Networks6
Past Projects: WASP (EU-IP project), ARTDECO (FIRB), WISEDAEMON (PRIN),
COMPLEX (IP, POLIMI WP leader)
Present FP7/H2020: CONTREX (IP, POLIMI WP leader), Safecop (ECSEL)
Contact: prof. William Fornaciari ([email protected])
Time deterministic WSN networks
HW/SW approach
Miosix: custom real-time OS
A novel WSN node with hardware-assisted
timing Time infrastructure with 21ns resolution
Ultra-low power
Time deterministic WSN networks
Sub-μs clock synchronization Compensating for propagation delays
Compensating for temperature drift of clock crystals
In multi-hop networks
References
A. Leva, F. Terraneo, S. Seva, I. Giacomello, "High-speed thermal
management for power-dense microprocessors" IEEE Conference on
Decision and Control (CDC), Las Vegas, USA, December 2016
A. Leva, F. Terraneo, W. Fornaciari, “Event-based control as an
enabler for high power density processors” IEEE International
Conference on Event-Based Control, Communication, and Signal
Processing (EBCCSP)
F. Terraneo, A. Leva, W. Fornaciari, "Demo: A High-Performance,
Energy-Efficient Node for a Wide Range of WSN Applications"
International Conference on Embedded Wireless Systems and
Networks (EWSN), Graz, Austria, February 2016
F. Terraneo, A. Leva, S. Seva, M. Maggio, A. V. Papadopoulos,
"Reverse Flooding: exploiting radio interference for efficient
propagation delay compensation in WSN clock synchronization" IEEE
Real-Time Systems Symposium (RTSS), San Antonio, Texas,
December 2015
Application Software:
Power/Energy Estimation and Optimization
• Accurate source-level (C) estimation of power consumption
• Coverage of library and Operating System Calls, not only pure application software
• SWAT - Prototype toolchain based on LLVM
• Two order of magnitude faster than dedicated ISS enabling design space exploration
• Accuracy of estimates in the band of 5% w.r.t. ISS (ex. STM-REISC processor with a
dozen of benchmarks)
• Toolchain easy to be extended to cover other processor architectures
• Account for advance power management features including
voltage and frequency scaling
• Possibility to optimize software energy with design space
exploration of source level transformation
• Over 12 years of experience
Past EU projects: PEOPLE (estimation), POET (Optimization), COMPLEX (IP project, POLIMI WP
leader)
Present FP7: CONTREX (IP project, POLIMI WP leader)
Contact: prof. William Fornaciari ([email protected])
Automatic Multi-Objective Design Space Exploration
• Fast exploration of the design space (HW and SW)
• Availability of a tool (MOST) capable to perform
Automatic Multi Objective Design Space Exploration
with a range of possible reports
• Used in projects to optimize multi many core
architectures when running high-end multimedia
applications
• Exploration of application-specific parameters space
• Standards XML interfaces to simplify integration with
existing simulator and architectural models
• Support for run-time management
• Over 10 years of experience
Past EU-Projects: MULTICUBE and 2PARMA (STREP projects, POLIMI coordinator)
Present FP7: CONTREX, HARPA, Antarex
● DSE of power/performance and thermal/reliability trade-off in high-performance processor arch.
● Thermal/reliability design solutions and optimizations
● Exploration of power and thermal aspects in Network-on-Chip design
● Reliability projection as a function of temperature profile (independent MTTF modeling)
● On-line NBTI estimation based on state-of-the-art models
● Impact of within-die random and systematic process variation on aging and performance
● NBTI mitigation of units in a superscalar processor and routers in a Network-on-Chip
HANDS (Heterogeneous Architectures and Networks-
on-Chip Design and Simulation)
William FORNACIARI– Politecnico di Milano (ITALY)
12
●GEM5 Arch block level statistics
●McPAT and Orion2.0 : power
consumption for processor and
routers
●Floorplan+hotspot: thermal profile
●MTTF, NBTI: reliability models
Present : launched in 2012, used at Amherst (Umass), Chalmers
Present FP7: HARPA (STREP project, POLIMI WP leader), MANGO
Contact: prof. William Fornaciari ([email protected])
Modelica thermal simulator (replace hotspot)
• Performing transient thermal simulations is a key feature of the proposed
simulation flow
• It allows to assess dynamic thermal policies in realistic settings
• To achieve this goal, a thermal simulator was developed using the object oriented
modeling paradigm
• Object-oriented modeling is different from object-oriented programming
• Object-oriented programming → classes and abstract data types
• Object-oriented modeling → modeling using differential algebraic equations (DAE)
• Modeling languages such as Modelica are available to support this modeling
paradigm
• Higher level modeling: the language supports differential equations in the same
way C supports assignment of an expression to a variable
• Component-oriented structuring of the simulator: allows to test components in
isolation, and connect components using a GUI
Modelica thermal simulator
Replacement of Hotspot in the
HANDS flow
The simulator supports a
configurable thermal dissipation
stack.
It can also simulate 3D die-stacked
chips, by instantiating multiple silicon
layers
This is the simplified layout of a 3D
chip with two active silicon layers
This is the same chip modeled in the
proposed simulator, by graphically
connecting components modeling the
silicon layers and the thermal
dissipation stack
Thermal management of multi/many core
architectures
• Modeling of thermal status of a system, including the thermal coupling of cores and
other system components, taking into account floorplan information
• Modeling of NoC thermal aspects in the design, for NBTI purposes
• Small computation overhead compatible with inclusion in run-time management OS
software
• Management policies tailored to increase short and long term reliability of the systems
• Identification of the equivalent (actual) computational capability of clusters induced by
thermal coupling among the cores. Information used for both allocation and
scheduling of tasks
• Prototype toolchain including a probing suite running under linux/Intel architectures
• Metrics can be used both at design and inside a system level run time manager
(developed in 2PARMA, usable in HARPA)
• Validation carried out on 2cores, 4 cores, 4x4cores (AMD clusters), big-little
Past FP7: 2PARMA (STREP project, POLIMI coordinator), SMECY (Artemis)
Present FP7/H2020 projects: HARPA (STREP project, POLIMI coordinator), MANGO
Contact: prof. William Fornaciari ([email protected])
Event-based thermal control
Modern multicore processors exhibit fast
temperature transients >20°C in a few milliseconds
A fast and low overhead control policy is required
Solution: event-based thermal control Based on a control-theoretical model
Hardware event generator
Software controller
Event-based thermal control
Fixed-rate control
cannot prevent fast
temperature transientsEvent-based control keeps temperature limit
Event-based controller generated many events when temperature
changes rapidly, and few events when temperature is nearly constant
Heterogeneous Systems for Mobile
Example: Samsung Exynos 5422
ARM big.LITTLE CPU based SoC 4 Cortex A15 @ 2GHz → high performance
4 Cortex A7 @1.4 GHz → low power
Heterogeneous Multi-Processing (HMP) The 8 cores can be used simultaneously
A wide range of power/performance operating points
Exploitation
Performance-hungry tasks on big cores, remaining tasks
on LITTLE cores
Can we use these platforms on remote systems? What about thermal management?
What about energy-budget management?
ODROID-XU3/4 board
Heterogeneous Systems for Mobile
Use case: HENESIS “Beesper” landslide prediction
system
Exynos5422 SoC platform exploitation
Solar-panel and battery powered
Monitoring activity in remote areas
H24 availability required
Heterogeneous Systems for Mobile
Proposed solution A run-time resource manager
Computing resources assigned to applications
according to: Application requirements
HW platform status (thermal, energy-budget,...)
~ 6.x W power consumption
@ 2 GHz → 96 °C (with fan)
~1 W power consumption
@ 1.4 GHz
Control the assignment of Cortex A15 CPU time depending on
current chip temperature and battery/solar-panel energy/power
budget
The BarbequeRTRM
Run-time resource manager Open-source project
Involved in EU FP7 and H2020 projects
Modular resource manager on top of Linux OS Several resource allocation policies
Heterogeneous and homogeneous systems support
Distributed system support (under development)
Android devices (under development)
Open source the framework, customizations for company
under a fee, by a startup
http://bosp.dei.polimi.it
Past FP7 projects: 2PARMA (STREP project, POLIMI
coordinator), SMECY (Artemis
Present FP7: HARPA (STREP project, POLIMI coordinator),
CONTREX (IP, Polimi WPL), MANGO
Contact: prof. William Fornaciari ([email protected])
● Dynamic compilation to adapt parallelism to system resources
● OpenCL mapping and optimization for computing fabrics
Mixing data parallelism with pipeline parallelism
Exploiting multiple parallel devices effectively
● Optimizations for data parallel programming
Expressing target-independent thread data affinity in data parallel
languages
Exploiting thread data affinity for task scheduling
High performance design and implementation of parallel programming
constructs
Parallel Programming models and dynamic
compilation
22
H2020: ANTAREX (POLIMI coordinator), MANGO
Past FP7: 2PARMA (STREP project, POLIMI coordinator),
SMECY (Artemis), OpenMediaPlatform (Dynamic compilation)
Contact: prof. Giovanni Agosta ([email protected])
Compiler Construction
• Co-development of compilers
and ISA extensions
• Compilers for special
purpose languages (e.g.,
OpenCL)
• Versioning compiler
• JIT technology
• Reverse engineering &
binary to binary translation
• Compiler support for extra-
functional properties
• Adaptivity
• Security
Contact: prof. Giovanni Agosta ([email protected])
Privacy and Security25
● Data Privacy: anonymous access to outsourced data
● Security of embedded systems:
Side-channel attacks: power, EM, timing
Vulnerability analysis: based on static data-flow analysis
Countermeasures:
• Automated instantiation of countermeasures
• Novel countermeasures based on code morphing
High performance implementation of encryption primitives
H2020: M2DC (high performance encryption on FPGA), SafeCop (wireless communication
security & data privacy in IoT scenarios)
Past FP7: ENIAC TOISE
Contact: prof. Gerardo Pelosi ([email protected])
Applied Cryptography and Computer Security
• Side Channel Attacks
• Power, Fault, EM attacks
• Countermeasure design
• Security assessment
• Security of cyberphysical
systems
• Authentication protocol
design
• Efficient implementation of
cryptographic primitives
• ASIC
• FPGA
• GPGPU
Contact: prof. Gerardo Pelosi and Alessandro Barenghi