Reflections on partitioning and resource sharing in...

Reflections on partitioning and resource sharing in MCS

Tullio [email protected]

HiPEAC CSWBarcelona @ BSC15 May 2014

Understanding the MCS question

The advent of multicore processors creates a wave of opportunities and challenges in many application domains

Opportunity Transition from federated systems (with lots of harness,

unused spare, workmanship hazard) to integrated systems with some degree of isolation

Challenge The integration solutions adopted for single processors

do not scale well

T. Vardanega (UNIPD) HiPEAC MC&R Workshop 2 of 29

Understanding the ex-ante /1

The integration solutions for single processors fall under the umbrella term of TSP (time and space partitioning) Memory space is segregated by design and supervised at

partition switches Caches are flushed on partition switch so that there is no inter-

partition interference

Time is allocated in slices to partitions and partitions do what they please with their slices Slice overruns are prevented by margin provisioning (insufficient

science but sufficient confidence or extreme scientific pessimism)



The TSP model is typified by the IMA (integrated modular avionics) and its ARINC 653 interpretation

Courtesy of



The TSP model silently builds on the single-runnerassumption The intrinsic reality of single-CPU computing

Execution is strictly sequential Concurrency is obtained by transparent interleaving

The level of underutilization (caused by overprovisioned static allocation) is naturally upper bounded by the (limited) CPU capacity

The waste is more than balanced by the bounty of incremental development and qualification (IDQ)


Understanding IDQ /1

Industry needs and wants IDQ to be preserved on transition to multicore processors

The corresponding practice needs to separate contained from container Contained is the application part (component)

independently developed or supplied Container is the cocoon that the system architecture

wraps around the contained to provide guarantees of sufficient independence



The single-runner assumption breaks on transition to multicore processors

Their HW architecture has numerous sources of interference among parallel co-runners Difficult to avoid, short of imposing a single runner per

time slice for the whole system Which is unspeakable

Difficult to bound, short of massive overprovisioning that defeats the purpose Which is bad




Instructioncache

Datacache

Courtesy of

Sources of time interference

Lots of shared HW, before even looking at SW!

Composability & compositionality /1


IICSTSCCST

JRCCSBR RextInt

Rclock

ihpj

jj

Aj

ni

iini

ni

ni

)(

1 )21(1

Blocking time(resource access protocol or kernel)

“In” context switch “Out” context switchInterference from the clock

Interference from interrupts

“Activation” jitter

“Wake-up” jitter

Time to issue a suspension call1

is a compositional term Its RHS equation benefits from composable terms

Composability & compositionality /2

The single-runner model of computation of the von Neumann architecture allows systems to enjoy a good flair of time composability (TC) Intrinsic at the HW level Aided by design and implementation choices at SW level

Multicore processor architectures shatter those premises and TC can longer be attained

The question then becomes whether TC has more shades of grey than just all-or-nothing


Ramifications /1

With IDQ, distinct parts may be developed at different levels of quality The essential obligations are sanctioned by the customer in

contractual arrangements Calling a given API, limiting the footprint, living within a bounded

execution time budget, meeting all application requirements, … Responded by the supplier with the provision of factual

evidence and assurance of given guarantees As those guarantees may be insufficient, safeguarding

measures must be adopted against violations By architectural choices and run-time means


Understanding IDQ /2


Component A

Container A

Component A

Container A

Component B

Container B

Connector AB

Component A

Component B

Component B

Container B

Ramifications /2

The integrated-system setting cast to multicore processors has come to be known as the mixed-criticality systems (MCS) camp

With too liberal use of the term “criticality”, which has caused much confusion and numerous ill-based speculations

We will return to the “criticality” confusionlater in this presentation …


Ramifications /3

Time is the principal area of concern for misbehaviour in real-time systems

Two top-level goals High schedulable utilization (aka, maximum guaranteed

performance) Sufficient guarantees that the important services are always

delivered in time (no hard deadline miss) Two alternative solution architectures

Asymmetric guarantees: one-level scheduling with some run-time monitoring

Symmetric guarantees: multi-level scheduling or hierarchical execution with run-time enforcement


Asymmetric guarantees /1

Goal: low-importance low-guarantee parts cannot cause higher-importance / guarantee parts to miss their deadline

This is the focus of a vast area of current MCS research Based on the postulate that higher assurance

corresponds to more conservative budgeting hence longer WCET The source of very complex ramifications

A. Burns and R. Davis, "Mixed Criticality Systems – A Review", TR, University of York 2013

www-users.cs.york.ac.uk/~burns/review.pdf


Asymmetric guarantees /2

Fundamentally If a task executes longer than budgeted at the current “criticality level” L, L is

raised to L+1 (higher) The only tasks that can continue executing at that point are those that have

residual budget at L+1 Every other task is immediately terminated as their claiming CPU time would

threaten the feasibility of tasks at level Q 1 Math theory and associated scheduling algorithms ensure that schedulability is

always guaranteed at the higher levels, in a cascading fashion

No industrial-quality results from that line of research to date (I say) Lots of unsolved real-world issues and lots of doubts on the sanity of

the underlying model (I say) Termination, no return to lower “critical levels”, no functional dependence, …


Necessary clarifications

“Criticality” does not correspond (directly) to deadline, period, WCET

Some authors relate “criticality level” to SIL (safety integrity level)

But this is doubtful as the SIL concept is related to importance and confidence

P. Graydon and I. Bate. Safety assurance driven problem formulation for mixed-criticality scheduling. Workshop on Mixed-Criticality Systems @ RTSS 2013

www.cs.york.ac.uk/ftpdir/reports/2013/YCS/486/YCS-2013-486.pdf


Understanding SIL


As in development process

Survivability

The system capacity to continue to deliver essential services in the event of internal or external failure Executing longer than budgeted is an error state arising from a

development fault but not necessarily a system failure Missing a deadline might be a system failure if there is no residual

utility in later completion Survivability might require system reconfiguration into an

acceptable degraded form The assurance goal of reconfiguration for survivability is

graceful degradation The assurance goal of tolerating overruns is partitioning integrity

Reconfiguration has requirements that are poorly captured by real-time systems theory


Symmetric guarantees /1

Goal: every part stays within their assigned bounds regardless of any other considerations

This ambit is more traditional Various solutions

Resource-reservation kernels Hierarchical budget servers Partitioning (dislocation) Hypervisoring with or without virtualization

No option can achieve true time isolation, short of using special HW All must overprovision to compensate for intrinsic

interference


Symmetric guarantees /2

Oh man, do I love partitioning! An example

H. Kim, A. Kandhalu, R. Rajkumar. A Coordinated Approach for Practical OS-Level Cache Management in Multi-core Real-Time Systems. 80-89.

ECRTS 2013

Solution made of three parts Cache reservation: to address inter-core cache interference

Memory pages to tasks; tasks to cores; pages to cache partitions Cache sharing: to mitigate cache partition shortage Cache-aware task allocation: to minimise CPU load while meeting

all memory demands and feasibility requirements Seek feasible task-to-core allocations with as few cache partitions as

possible sub-optimal fix-point heuristic


Sharing SW resources

The premises on which single-runner solutions were based fall apart Suspending is no longer conducive to earlier release of

shared resource parallelism gets in the way Boosting the priority of the lock holder does not too

per-CPU priorities may not have global meaning Having local and global resources causes suspending to

become dangerous local priority inversions may occur Spinning protects against that hazard but wastes CPU

cycles


locking protocols : global


locking protocols : partitioned


Three sources of blocking!

Priority boosting for earlier release of resource Everyone pays for it since contending tasks may be on

any CPU

FIFO queuing for the contending tasks ,

Contention token Round-robin across CPUs


independence preservation /1


independence preservation /2

Clusters of size Suspension-based Head of per-cluster FIFO participates in global FIFO The per-cluster queue is FIFO+PRIO

Independence preserved by intra-cluster migration Head of global FIFO (if pre-empted) can migrate to any

CPU along the global FIFO and inherit the priority of the waiting task

Blocking is per request: ,


MrsP /1


MrsP /2

For partitioned scheduling ( ) Spinning-based Local wait spinning at local ceiling

Allows using uniprocessor response time analysis Blocking increased by parallelism: Per resource

Earlier release obtained by migrating lock holder (if pre-empted) to the CPU where the first contender in the global FIFO is currently spinning


Reflections on partitioning and resource sharing in...

Documents

Transcript of Reflections on partitioning and resource sharing in...