GENIVI Lifecycle Webcast 30th January 2014 Lifecycle Webcast.pdf · Startup/Shutdown Management 6...
Transcript of GENIVI Lifecycle Webcast 30th January 2014 Lifecycle Webcast.pdf · Startup/Shutdown Management 6...
29-Jan-14
Dashboard image reproduced with the permission of Visteon and 3M Corporation
GENIVI is a registered trademark of the GENIVI Alliance in the USA and other countries
Copyright © GENIVI Alliance 2013 . This work is licensed under a Creative Commons
Attribution-ShareAlike 4.0 International License.
1
GENIVI Lifecycle Webcast
30th January 2014
David Yates, Continental Automotive Gmbh
Lifecycle topic owner and SysArch Member
The aim of this presentation is to provide an overview of the Lifecycle architecture within GENIVI detailing where we believe the Automotive world requires extensions to existing open source solutions.
The following topics will be covered:
• Welcome & Introduction
• Lifecycle Domain Overview
• Component Overview
• Startup/Shutdown Concept
• Introduction to NSM (session management)
• Introduction to Resource Management
• Roadmap
• Call to action
• Location of further information (AMM presentations)
Scope of Presentation
29-Jan-14 2 GENIVI is a registered trademark of the GENIVI Alliance in the USA and other countries
Copyright © GENIVI Alliance 2013
29-Jan-14 3
Lifecycle Domain Overview
Node State Management
Resource Management
Boot Management
Supply Management
Thermal Management
Power Management
• Get internal supply/thermal states
• Supply state change notification
• Thermal state change notification
• Set events (session information) • Request state change • Get node states • Node state change notification • Set last user context
• Resource limitation • Run time observation
Persistency Log & Trace
Lifecycle Domain
• Startup/shutdown management
GENIVI is a registered trademark of the GENIVI Alliance in the USA and other countries
Copyright © GENIVI Alliance 2013
29-Jan-14 4
Lifecycle Manifest
Node State Management
Resource Management
Boot Management
Supply Management
Thermal Management
Power Management
cgroup service
Node State Manager
Power Event Collector
Node Resource Mgr
Node Health Monitor
Supply Manager
Thermal Manager
<<refine>>
<<refine>>
<<refine>>
<<refine>> <<refine>>
<<refine>>
<<refine>>
systemd <<refine>>
Node Startup Controller
<<refine>>
Package
Product Component
Platform Component Node State Machine
<<refine>>
GENIVI is a registered trademark of the GENIVI Alliance in the USA and other countries
Copyright © GENIVI Alliance 2013
29-Jan-14 5
Lifecycle Concept
Node State Management
Resource Management
Boot Management
Supply Management
Plug in for: ADC, PMIC
Plug in for: Sensors, Devices
Thermal Management
Reaction on conditions
Reaction on conditions
Turn off display, drives, mute audio,…
Turn on fan, reduce audio volume,…
Plug in for: Wakeup reason, node / vehicle network
Power Management
• Clamp State
• Button WU
• Bus WU
• Vehicle Network
Events:
• Good
• Poor
• Bad
State chart
State chart
HMI, Phone,… SWL/Update Diagnostics
• Session handling (Phone, Diag,SWL, …)
• Node State change protocol
• Shutdown management
Boot config
Plug in for: Application specific observing and recovering
Node
Node observing for CPU load, memory, appl. crash
Ctrls
Resource config
*1 state change notifications
*1
Set LUC Last-User-Context
State chart
GENIVI is a registered trademark of the GENIVI Alliance in the USA and other countries
Copyright © GENIVI Alliance 2013
Startup/Shutdown Management
6
Boot Management
Node State Management
Startup Management
Shutdown Management
takes care about
takes care about
Why do we have this split?
systemd stops and unloads all components during its shutdown concept. This requires alot of time to
make them functional again in the event of a cancel shutdown.
An IVI system must be able to resume operation without losing any context and without the need for a
reboot. Therefore Node State Management will only call registered consumers in the shutdown phase.
This event notification will drive the components into a stable state and ensure that everything has been
stored which will be needed for the next startup.
With this approach components would not be shutdown which is required for certain exceptions like the
flash filesystem. Therefore additionally the shutdown management concept will include/use the systemd
shutdown concept , where appropriate for legacy/critical components.
GENIVI is a registered trademark of the GENIVI Alliance in the USA and other countries
Copyright © GENIVI Alliance 2013
Shutdown preparation in Startup Phase
7
Mandatory targets
(Base System & Early Features)
focussed.target
(last user context)
unfocussed.target(s)
lazy.target
initrd
kernel Runlevel replacement
GENIVI extensions
Before systemd
FULLY_
OPERATIONAL
LUC_RUNNING
FULLY_RUNNING
BASE_RUNNING (during Node Startup
Controller init)
Node S
tate
Manager
Start NSM via systemd
A
B C
J
GENIVI is a registered trademark of the GENIVI Alliance in the USA and other countries
Copyright © GENIVI Alliance 2013
Shutdown Execution
8
app1.service
Node S
tate
Manager
Consumer J
Consumer I
Consumer H
Consumer G
Consumer F
Consumer E
Consumer D
Consumer C
Consumer B
Consumer A
Node Startup
Controller systemd
systemd
systemd
app2.service
Shutdown.target
(flash file systems)
Enables:
1. Shutdown activities are triggerable without unloading the components.
2. Legacy components can be shut down in their traditional way.
3. Full flexibility on where to integrate systemd based shutdown units.
Writing LUC
Writing LUC
Node Startup
Controller
Node Startup
Controller
GENIVI is a registered trademark of the GENIVI Alliance in the USA and other countries
Copyright © GENIVI Alliance 2013
NSM Session Management
9
Node State Machine
Node State Manager
Phone
SWL
Audio
HMI
Events/
Data
Events/
Data
Node Session State
PhoneSession SWLSession
….
Node State PhoneSession SWLSession
….
Set method
Signal
Shutdown
Phone
Audio
HMI
Navigation
Lifecycle Requests
Navi
Request system restart
GENIVI is a registered trademark of the GENIVI Alliance in the USA and other countries
Copyright © GENIVI Alliance 2013
Resource Management - Goals
10
Resource management contains the functionality to ensure that the node runs in a stable
and defined manner.
To do this, it will monitor and limit different aspects of SW component behavior including
system resources (i.e. CPU load and memory) and critical run-time observation.
Resource allocation will be configurable on a component basis through the use of cgroups.
Health management will provide a configurable escalation strategy defining actions to be
taken in the case of system failures.
Note: what is not included is security handling for resources (i.e. restricted access to
resources)
GENIVI is a registered trademark of the GENIVI Alliance in the USA and other countries
Copyright © GENIVI Alliance 2013
Recovery Apps Recovery Apps Applications Applications
Health Management
11
start/
restart
systemd
Recovery Client
Health Management will ensure that the node runs in a stable and defined manner. To do this it is planned
to have the following multi layered observation system and escalation strategy:
Applications
notify
alive
/dev/watchdog
notify alive
NHM execute
recovery
forward NHM heartbeat externally or to internal HW Watchdog
NSM
start/ restart
notify alive and
monitor node status
request node
restart
monitoring of userland
request app
restart
request node
restart
Persistence Delete app data
Read/write data
GENIVI is a registered trademark of the GENIVI Alliance in the USA and other countries
Copyright © GENIVI Alliance 2013
Concepts for the System
Health Management - NHM
The Node Health Monitor will work in conjunction with systemd to monitor component
failures in the system. It will be responsible for :
• Monitoring systemd to automatically record and track application failures
• Providing an interface with which components can register failures when not using the
systemd watchdog mechanism
• Maintaining failure statistics over multiple lifecycles for the system and components
• the service name will be used to identify and track component failures
• statistics on number of failures in number of lifecycles will be maintained (i.e. 3
failures in last 32 lifecycles)
• Monitoring the wakeup and shutdown events to catch unexpected system restarts
• Provide an interface for components to read system and component error counts
• Provide an interface for recovery applications to request a node restart
GENIVI is a registered trademark of the GENIVI Alliance in the USA and other countries
Copyright © GENIVI Alliance 2013
Concepts for the System
Health Management – NHM cont..
Additionally the Node Health Monitor will test a number of product defined criteria with the
aim to ensure that userland is stable and functional. For instance it will be able to validate
that :
• there is enough free system memory
• the CPU is not reporting an excessively high load for a sustained period
• defined file accessibility is possible
• defined processes are still running
• communication is possible (DBUS)
• a user defined process can be executed with an expected result
If the NHM believes that there is an issue with user land then it will be capable to initiate
a system restart
GENIVI is a registered trademark of the GENIVI Alliance in the USA and other countries
Copyright © GENIVI Alliance 2013
Concepts for the System
Health Management – Node Wdog
It is proposed to use, when supported, a low level HW watchdog to validate that systemd
is running correctly.
A typical watchdog implementation is capable to initiate an emergency shutdown process
when it believes that a failure has occurred :
– idle init, so nothing new can be started
– kill all processes
– write a reboot record to wtmp
– turn off accounting
– turn off quota
– turn off swap
– unmount all mounted partitions
NOTE: In this scenario a normal system shutdown will not be completed therefore
cached persistent data from that Lifecycle will be lost
GENIVI is a registered trademark of the GENIVI Alliance in the USA and other countries
Copyright © GENIVI Alliance 2013
Concepts for the System
Health Management - systemd
systemd provides watchdog functionality for monitoring and restarting failing services in
the system and for sending heartbeats itself to a HW Watchdog
Within a service unit file it is possible to configure systemd that it will expect a heartbeat
from the service within a particular time interval (WatchdogSec=).
If this heartbeat is not received then systemd can be directed using tags in the
applications unit file on how to behave. Typically this will result in the application being
automatically restarted (Restart=).
The problem is that this can result in a cyclic restart scenario with only limited options
(StartLimitInterval=, StartLimitBurst=) to influence the restart behavior.
Therefore, it is proposed that recovery applications are started automatically by systemd
(OnFailure=) in case of failures.
GENIVI is a registered trademark of the GENIVI Alliance in the USA and other countries
Copyright © GENIVI Alliance 2013
Concepts for the System
Health Management – Recovery Client
A “Recovery Client” is a component that is executed when a failure has been detected in
the system. There can be a one to one relationship between apps and recovery clients or
one client can handle multiple apps. It should contain enough functionality to be able to :
• request the error status count from the NHM
– providing the name of the service file failing
• based on the error count attempt recovery, for instance:
– if a file system fails to mount then the recovery action could be to format the file
system and request a node restart
– if it is an application that has failed multiple times then we may want to delete that
applications persistency data and restart the application
– when possible, request that the SW is uninstalled or rolled back
• request systemd to restart the application
• request a node restart via the NHM
GENIVI is a registered trademark of the GENIVI Alliance in the USA and other countries
Copyright © GENIVI Alliance 2013
Resource Management
17
Resource Management
Node Resource Manager
<<refine>>
• Monitor system resources • Kill resource abusers
• Starts services • Configure cgroups
systemd
P3
Application Component
<<refine>>
Node State Manager
<<refine>>
Node State Mgmt
• Evaluate node restart requests
Supply Control Logic
• Handle node restart requests
cgroups
• Control system resources
• Report/Handle resource allocation errors
GENIVI is a registered trademark of the GENIVI Alliance in the USA and other countries
Copyright © GENIVI Alliance 2013
Example cgroup configuration (CPU)
18
Radio NAV
Browser Media Phone
PDC
Safety
Cameras Diagnostics
SW
Loading
Kernel
Infrastructure
Services
Positioning
Speech 3rd party
APPS
Vehicle
Network
ROOT
Unlimited Comm
Stacks
AUTOMOTIVE
cpu.shares = 50,
runtime= 100,
period = 1000
APPS
cpu.shares = 20,
runtime= 500,
period = 2000
BGND
cpu.shares = 1,
Background
tasks
Weather
GENIVI is a registered trademark of the GENIVI Alliance in the USA and other countries
Copyright © GENIVI Alliance 2013
Example cgroup configuration (Memory)
19
Radio NAV
Browser Media Phone
PDC
Safety
Cameras Diagnostics
SW
Loading
Kernel
Infrastructure
Services
Positioning
Speech 3rd party
APPS
Vehicle
Network
ROOT
Unlimited
Comm
Stacks APPS
memory.limit_in_bytes = 200M
…..
BGND
memory.limit_in_bytes = 10M
……
Background
tasks
Weather
GENIVI is a registered trademark of the GENIVI Alliance in the USA and other countries
Copyright © GENIVI Alliance 2013
29-Jan-14 20
Lifecycle Roadmap
20
cgroup service
Node State Machine
Node Resource Mgr
Node Health Monitor
systemd
Node Startup Controller
Node State Manager
Gemini Horizon Roadmap Adopted comps.
from the OSS
community
Owned component, funded by
GENIVI, implemented by Codethink
Owned component, implemented by
Continental
specific
specific
abstract
placeholder
abstract
placeholder
specific
specific
placeholder Product specific extension to the
Node State Manager
Owned component, implemented by
Continental
Owned component, to be
implemented by Continental
specific
GENIVI is a registered trademark of the GENIVI Alliance in the USA and other countries
Copyright © GENIVI Alliance 2013
29-Jan-14 21
Call to action
21
We hope today’s presentation has interested you in what we are working on within GENIVI and the Open Source Software that we have already released and plan to release in the future.
The components described today have been defined and created within the GENIVI consortium as Open Source Software with the MPLv2 licence.
For that reason the code is freely available in a public git repository outside of GENIVI.
If you have interest in the components and can see other potential uses in your domains then please check out the links on the following slides.
We are very open to inputs and requirements from all interested parties so please ask questions and get involved.
http://projects.genivi.org/node-state-manager/about
GENIVI is a registered trademark of the GENIVI Alliance in the USA and other countries
Copyright © GENIVI Alliance 2013
29-Jan-14 22
Call to action continued
22
For those already working inside of GENIVI that wish to contribute directly in the Systemd Infrastructure group we are always looking for more participants and have many topics ongoing for which you might be interested:
Persistency
User Management
SW Management
IPC
Diagnostics
Please feel free to check out the GENIVI Wiki page where you can find more information about the above topics and how to participate in our weekly telephone conference calls.
https://collab.genivi.org/wiki/display/genivi/System+Infrastructure+Expert+Group
GENIVI is a registered trademark of the GENIVI Alliance in the USA and other countries
Copyright © GENIVI Alliance 2013
29-Jan-14 GENIVI is a registered trademark of the GENIVI Alliance in the USA and other countries
Copyright © GENIVI Alliance 2012 23
Further Information
23
If you are interested in further information regarding the GENIVI Lifecycle concept then you can find information within the GENIVI Wiki and public project page:
https://collab.genivi.org/wiki/display/genivi/SysInfraEGLifecycleDef (restricted)
http://projects.genivi.org/node-state-manager/about (open)
All presentations of the concepts can be found using this link
Lifecycle Presentations
The code for the Node State Manager and the Node Health Monitor can be found in the GENIVI git :
http://git.projects.genivi.org/?p=lifecycle/node-state-manager.git
http://git.projects.genivi.org/?p=lifecycle/node-startup-controller.git
http://git.projects.genivi.org/?p=lifecycle/node-health-monitor.git
and you can contact me directly ([email protected])