Achieving Predictable Execution in COTS-based Embedded Systemshomepages.laas.fr › mroy ›...

2
Achieving Predictable Execution in COTS-based Embedded Systems Stanley Bak , Rodolfo Pellizzoni , Emiliano Betti , Gang Yao , John Criswell , Marco Caccamo , Russel Kegley ] University of Illinois at Urbana-Champaign, USA , University of Waterloo, Canada , Lockheed Martin Corp., USA ] Abstract—Building safety-critical real-time systems out of inex- pensive, non-real-time, Commercial Off-the-Shelf (COTS) compo- nents is challenging. Although COTS components generally offer high performance, they can occasionally incur significant timing spikes. To prevent this, we propose controlling the operating point of shared resources, for example main memory, to maintain it below its saturation limit. This is necessary because the low-level arbiters of these shared resources are not typically designed to provide real-time guarantees. Here, we discuss a novel system execution model, the PRedictable Execution Model (PREM), which, in contrast to the standard COTS execution model, coschedules, at a high level, components in the system which may access main memory, such as CPUs and I/O peripherals. To enforce predictable, system-wide execution, we argue that real- time embedded applications should be compiled according to a new set of rules dictated by PREM. To experimentally validate the proposed theory, we developed a COTS-based PREM testbed and modified the LLVM Compiler Infrastructure to produce PREM-compatible executables. I. PREDICTABLE EXECUTION MODEL (PREM) Building computer systems out of commercial off-the-shelf (COTS) components, as opposed to custom-designed parts, typically improves time-to-market, reduces system cost, while providing generally better performance. For real-time systems, however, one hurdle in the way of using COTS is transient timing spikes which may occur when there is contention for shared resources. The low-level arbiter of shared resources in a COTS system typically does not have a mechanism to deal with the timeliness aspects of incoming requests, which may end up delaying more critical tasks, causing an unintended and undesirable priority inversion. The PRedictable Execution Model (PREM) [1], in contrast to the standard COTS execution model, coschedules at a high level all active components in the system, such as CPU cores and I/O peripherals. Briefly, the key idea is to control when active components access shared resources so that contention for accessing shared resources is implicitly resolved by the high-level coscheduler without relying on low-level, non-real- time arbiters. Here, we specifically focus our attention on contention at the level of the interconnect and main memory. <8= $%&'()*" <:'>& .&3'>&2 :", /&?-:'&4&"32 8&/9?>&/:- ,:3: 3/:"2.&/2 4&4*/1 ?>:2& &%&'()*" ?>:2& Fig. 1. A predictable interval is composed of 2 main phases: memory phase and execution phase. Peripherals are allowed to access the bus only during the execution phase A. Scheduling Memory Access In order to schedule access to the main memory at a high level, we propose the PREM execution model, where tasks running on the CPU are logically divided into two types of intervals: compatible intervals and predictable intervals. The compatible intervals are compiled and executed without further modifications; they are backwards compatible, but they should nonetheless be minimized in order to provide a good level of resource (main memory) utilization. Predictable intervals, on the other hand, are split into two phases: a memory phase and an execution phase. During the memory phase, the task can access main memory, typically loading the cache with memory which will be accessed during the rest of the interval. During an execution phase, no further main memory access is performed. Each interval executes for a fixed amount of time equal to its worst-case execution time, which simplifies system-wide scheduling. With PREM, peripheral access to main memory is also controlled. Peripherals will only access main memory when a task is in its execution phase. In this way, only one active component at a time accesses main memory, avoiding the effects of the non-real-time interconnect and memory arbiters. Figure 1 shows an example of a single predictable interval. A more complex scenario is shown in Figure 2, where two tasks (τ 1 and τ 2 ) run together with two related peripheral I/O flows (τ I/O 1 and τ I/O 2 ). In this case, the input and output for each task is done in the adjacent I/O periods (double

Transcript of Achieving Predictable Execution in COTS-based Embedded Systemshomepages.laas.fr › mroy ›...

Page 1: Achieving Predictable Execution in COTS-based Embedded Systemshomepages.laas.fr › mroy › TORRENTS2011 › bak.pdf · a scheduling coordinator is necessary to ensure the various

Achieving Predictable Execution inCOTS-based Embedded Systems

Stanley Bak†, Rodolfo Pellizzoni‡, Emiliano Betti†, Gang Yao†, John Criswell†,Marco Caccamo†, Russel Kegley]

University of Illinois at Urbana-Champaign, USA†,University of Waterloo, Canada‡,Lockheed Martin Corp., USA]

Abstract—Building safety-critical real-time systems out of inex-pensive, non-real-time, Commercial Off-the-Shelf (COTS) compo-nents is challenging. Although COTS components generally offerhigh performance, they can occasionally incur significant timingspikes. To prevent this, we propose controlling the operating pointof shared resources, for example main memory, to maintain itbelow its saturation limit. This is necessary because the low-levelarbiters of these shared resources are not typically designed toprovide real-time guarantees. Here, we discuss a novel systemexecution model, the PRedictable Execution Model (PREM),which, in contrast to the standard COTS execution model,coschedules, at a high level, components in the system whichmay access main memory, such as CPUs and I/O peripherals. Toenforce predictable, system-wide execution, we argue that real-time embedded applications should be compiled according to anew set of rules dictated by PREM. To experimentally validatethe proposed theory, we developed a COTS-based PREM testbedand modified the LLVM Compiler Infrastructure to producePREM-compatible executables.

I. PREDICTABLE EXECUTION MODEL (PREM)

Building computer systems out of commercial off-the-shelf(COTS) components, as opposed to custom-designed parts,typically improves time-to-market, reduces system cost, whileproviding generally better performance. For real-time systems,however, one hurdle in the way of using COTS is transienttiming spikes which may occur when there is contention forshared resources. The low-level arbiter of shared resources ina COTS system typically does not have a mechanism to dealwith the timeliness aspects of incoming requests, which mayend up delaying more critical tasks, causing an unintended andundesirable priority inversion.

The PRedictable Execution Model (PREM) [1], in contrastto the standard COTS execution model, coschedules at a highlevel all active components in the system, such as CPU coresand I/O peripherals. Briefly, the key idea is to control whenactive components access shared resources so that contentionfor accessing shared resources is implicitly resolved by thehigh-level coscheduler without relying on low-level, non-real-time arbiters. Here, we specifically focus our attention oncontention at the level of the interconnect and main memory.

!"#$%&'()*"#+*,&-#.*/#0123&456&7&-#8/&,9'3:;9-931#

8/&,9'3:;-&#&%&'()*"#9"3&/7:-2#

<8=#$%&'()*"#

<:'>&#.&3'>&2#:",#/&?-:'&4&"32#

8&/9?>&/:-#,:3:#3/:"2.&/2#

4&4*/1##?>:2&#

&%&'()*"#?>:2&#

Fig. 1. A predictable interval is composed of 2 main phases: memory phaseand execution phase. Peripherals are allowed to access the bus only duringthe execution phase

A. Scheduling Memory Access

In order to schedule access to the main memory at a highlevel, we propose the PREM execution model, where tasksrunning on the CPU are logically divided into two types ofintervals: compatible intervals and predictable intervals. Thecompatible intervals are compiled and executed without furthermodifications; they are backwards compatible, but they shouldnonetheless be minimized in order to provide a good level ofresource (main memory) utilization. Predictable intervals, onthe other hand, are split into two phases: a memory phaseand an execution phase. During the memory phase, the taskcan access main memory, typically loading the cache withmemory which will be accessed during the rest of the interval.During an execution phase, no further main memory accessis performed. Each interval executes for a fixed amount oftime equal to its worst-case execution time, which simplifiessystem-wide scheduling.

With PREM, peripheral access to main memory is alsocontrolled. Peripherals will only access main memory whena task is in its execution phase. In this way, only one activecomponent at a time accesses main memory, avoiding theeffects of the non-real-time interconnect and memory arbiters.Figure 1 shows an example of a single predictable interval.

A more complex scenario is shown in Figure 2, where twotasks (τ1 and τ2) run together with two related peripheral I/Oflows (τ I/O1 and τ

I/O2 ). In this case, the input and output

for each task is done in the adjacent I/O periods (double

Page 2: Achieving Predictable Execution in COTS-based Embedded Systemshomepages.laas.fr › mroy › TORRENTS2011 › bak.pdf · a scheduling coordinator is necessary to ensure the various

!"#"$%&'(')*+'),+'('-''./00'1"2&'('),+'('-''3/(3'

4' (4' 04' ,4' *4' 54' 34'

6$278''9:8:'

"78278''9:8:'

&';"%2:<=>?'''6$8?@A:>'

&'%?%"@B'''2C:D?'

&'?E?;7<"$'''2C:D?'

τ1

τ2

τI/O1

τI/O2

s1,1 s1,2 s1,3

s2,2s2,1 s2,3 s2,4

&'F/G'H"I'

Fig. 2. A possible scheduling flow for two tasks.

buffering), and I/O driver execution at the start and end ofeach real-time job is done using compatible intervals. Again,only one component accesses memory at any time.

B. Practical Aspects and Implementation

In order to perform PREM scheduling, practical solutionsare necessary to do both the division of intervals of execu-tion into memory and execution phases, as well as ways tocontrol when peripherals access main memory. Furthermore,a scheduling coordinator is necessary to ensure the variouscomponents follow the system-wide schedule. We developedsolutions for these concerns, and evaluated our prototypesystem on several benchmarks.

From the task division aspect, we defined preprocessormacros which are used by a programmer to prefetch, intocache, data structures used during a predictable interval. Thesemacros would also send messages to the global schedulercomponent, and included a loop to enforce a constant exe-cution time for predictable intervals (equal to the worst-caseexecution time). This simplified the process of having a globalschedule without affecting hard real-time guarantees (sincethe system is always provisioned for the worst-case executiontime). We also created an LLVM [3] compiler pass needed toprefetch the code and stack used during a predictable interval.As part of future work, we are planning to further extendour compiler modifications in order to locate the accessed thedata structures in a predictable interval automatically, insteadof relying on the programmer.

For the peripherals, we created a real-time bridge which,transparent to user-level applications, buffered a peripheral’sinput and output until the global scheduler permitted accessto main memory. This is necessary since, if we stopped thecommunication on the bus without hardware modification,the internal buffers of the peripheral could fill up and leadto data loss. Our real-time bridge included a large externalmemory which could store the data for longer periods oftime while main memory access was restricted. Althoughthis is custom hardware added into the system, off-the-shelfperipherals do not need to be modified to use the real-timebridge. This was demonstrated using an unmodified TEMACEthernet component on a Xilinx ML507 FPGA. Software

COTSMotherboard

FSB CPUNorthBridge

SouthBridge

RAM

PCIe

PCI

Real-TimeBridge

COTSPeripheral

Real-TimeBridge

COTSPeripheral

Real-TimeBridge

COTSPeripheral

PeripheralScheduler

Fig. 3. PREM system architecture

running on the main system saw a standard network interfacecard.

Finally, to coordinate the system we created a new asystem-aware hardware component. This peripheral scheduler,receives short messages from the CPU over the PCIe bus,and controls when peripherals can access the bus. In ourimplementation this was also done using a Xilinx ML505FPGA. The preprocessor macros would send start and endinterval messages to the peripheral scheduler. The peripheralscheduler would, at the appropriate times, allow the real-timebridges to transfer data to or from main memory using adedicated wire.

A system-level view of the architecture is shown in Figure3. Here, the peripheral scheduler is directly connected throughexternal wires to each of the real-time bridges. It is also onthe PCIe bus, which allows it to send and receive messagesfrom the task running on the CPU. These messages are sentwhen the task executes the PREM preprocessor macros.

To test our prototype, several benchmarks have been com-piled and executed according to PREM. These benchmarksdemonstated both that PREM could practically be used toincrease system predictability, and that PREM did not usuallyintroduce significant overhead. Tasks where PREM did notperform well were ones where the exact amount of data whichneeded to be prefetched during the memory interval could notbe determined at runtime (for example, programs which had alargely-varying output size). In real-time systems, however,we believe these programs are rare since their executiontimes will also likely have a corresponding largely-varyingrange. Additional details about our experimental setup and thebenchmarks are available in the PREM conference paper [1].

REFERENCES

[1] R. Pellizzoni, E. Betti, S. Bak, G. Yao, J. Criswell, M. Caccamo, andR. Kegley, “A predictable execution model for cots-based embedded sys-tems,” in Proceedings of the 17th Real-Time and Embedded Technologyand Applications Symposium, Chicago, IL, USA, April 2011.

[2] S. Bak, E. Betti, R. Pellizzoni, M. Caccamo, and L. Sha, “Real-timecontrol of I/O COTS peripherals for embedded systems,” in Proc. of the30th IEEE Real-Time Systems Symposium, Washington DC, Dec 2009.

[3] C. Lattner and V. Adve, “LLVM: A compilation framework for lifelongprogram analysis and transformation,” in Proc. of the InternationalSymposium of Code Generation and Optimization, Mar 2004.