Preliminary Investigations into a Microkernel OSAL for...

38
Preliminary Investigations into a Microkernel OSAL for cFS Gregor Peach, Joseph Espy, Zach Day, Gabriel Parmer, Alex Maloney Gerald Fry*, Curt Wu* The George Washington University * Charles River Analytics Acknowledgements: This material is based upon work supported by the National Science Foundation under Grant No. CNS 1149675, ONR Award No. N00014-14-1-0386, and ONR STTR N00014-15-P-1182 and N68335-17-C-0153. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation or ONR.

Transcript of Preliminary Investigations into a Microkernel OSAL for...

Page 1: Preliminary Investigations into a Microkernel OSAL for cFSflightsoftware.jhuapl.edu/files/2017/cFS-Day/cFS17-B2-Presentation... · Preliminary Investigations into a Microkernel OSAL

Preliminary Investigations into a Microkernel OSAL for cFS

Gregor Peach, Joseph Espy, Zach Day, Gabriel Parmer, Alex Maloney

Gerald Fry*, Curt Wu*

The George Washington University* Charles River Analytics

Acknowledgements: This material is based upon work supported by the National Science Foundation under Grant No. CNS 1149675, ONR Award No. N00014-14-1-0386, and ONR STTR N00014-15-P-1182 and N68335-17-C-0153. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation or ONR.

Page 2: Preliminary Investigations into a Microkernel OSAL for cFSflightsoftware.jhuapl.edu/files/2017/cFS-Day/cFS17-B2-Presentation... · Preliminary Investigations into a Microkernel OSAL

Traditional Satellites

Fault tolerance ● Hardware redundancy● Rad-hardened processors● Single-core processors

Page 3: Preliminary Investigations into a Microkernel OSAL for cFSflightsoftware.jhuapl.edu/files/2017/cFS-Day/cFS17-B2-Presentation... · Preliminary Investigations into a Microkernel OSAL

CubeSats

Commodity hardware● High clock speed● Multi-core● Limited hardware reliability features

g

Page 4: Preliminary Investigations into a Microkernel OSAL for cFSflightsoftware.jhuapl.edu/files/2017/cFS-Day/cFS17-B2-Presentation... · Preliminary Investigations into a Microkernel OSAL

CubeSats

Commodity hardware● High clock speed● Multi-core● Limited hardware reliability features

Spare capacity + no HW reliability SW reliability→

g

Page 5: Preliminary Investigations into a Microkernel OSAL for cFSflightsoftware.jhuapl.edu/files/2017/cFS-Day/cFS17-B2-Presentation... · Preliminary Investigations into a Microkernel OSAL

CubeSats

Commodity hardware● High clock speed● Multi-core● Limited hardware reliability features

How to most effectively use the parallelism?

g

Page 6: Preliminary Investigations into a Microkernel OSAL for cFSflightsoftware.jhuapl.edu/files/2017/cFS-Day/cFS17-B2-Presentation... · Preliminary Investigations into a Microkernel OSAL

How can we use extra computational capacity to increase fault tolerance?

Page 7: Preliminary Investigations into a Microkernel OSAL for cFSflightsoftware.jhuapl.edu/files/2017/cFS-Day/cFS17-B2-Presentation... · Preliminary Investigations into a Microkernel OSAL

Aspects of SW Fault Tolerance

Remediation How do we return system to a well-defined state

Propagation How do we contain the scope of the fault

Detection Determine when system is in an erroneous state

Page 8: Preliminary Investigations into a Microkernel OSAL for cFSflightsoftware.jhuapl.edu/files/2017/cFS-Day/cFS17-B2-Presentation... · Preliminary Investigations into a Microkernel OSAL

Aspects of SW Fault Tolerance

Remediation How do we return system to a well-defined state

Propagation How do we contain the scope of the fault

Detection Determine when system is in an erroneous state

Page 9: Preliminary Investigations into a Microkernel OSAL for cFSflightsoftware.jhuapl.edu/files/2017/cFS-Day/cFS17-B2-Presentation... · Preliminary Investigations into a Microkernel OSAL

Core Flight System

SW Bus TablesMutex

HK

...

...CSS

Sched FSLoader Net PSP

Mission-specificapplications

General utilityapplications

Core FlightExecutive Functions

OperatingSystemAbstractionLayer

Page 10: Preliminary Investigations into a Microkernel OSAL for cFSflightsoftware.jhuapl.edu/files/2017/cFS-Day/cFS17-B2-Presentation... · Preliminary Investigations into a Microkernel OSAL

Core Flight System – Faults

SW Bus TablesMutex

HK

...

...CSS

Sched FSLoader Net PSP

Mission-specificapplications

General utilityapplications

Core FlightExecutive Functions

OperatingSystemAbstractionLayer

Page 11: Preliminary Investigations into a Microkernel OSAL for cFSflightsoftware.jhuapl.edu/files/2017/cFS-Day/cFS17-B2-Presentation... · Preliminary Investigations into a Microkernel OSAL

Core Flight System – Faults

SW Bus TablesMutex

HK

...

...CSS

Sched FSLoader Net PSP

Mission-specificapplications

General utilityapplications

Core FlightExecutive Functions

OperatingSystemAbstractionLayer

Page 12: Preliminary Investigations into a Microkernel OSAL for cFSflightsoftware.jhuapl.edu/files/2017/cFS-Day/cFS17-B2-Presentation... · Preliminary Investigations into a Microkernel OSAL

Core Flight System – Faults

SW Bus TablesMutex

HK

...

...CSS

Sched FSLoader Net PSP

Mission-specificapplications

General utilityapplications

Core FlightExecutive Functions

OperatingSystemAbstractionLayer

Page 13: Preliminary Investigations into a Microkernel OSAL for cFSflightsoftware.jhuapl.edu/files/2017/cFS-Day/cFS17-B2-Presentation... · Preliminary Investigations into a Microkernel OSAL

Core Flight System – Faults + POSIX

SW Bus TablesMutex

HK

...

...CSS

Sched FSLoader Net PSP

Mission-specificapplications

General utilityapplications

Core FlightExecutive Functions

OperatingSystemAbstractionLayer

Page 14: Preliminary Investigations into a Microkernel OSAL for cFSflightsoftware.jhuapl.edu/files/2017/cFS-Day/cFS17-B2-Presentation... · Preliminary Investigations into a Microkernel OSAL
Page 15: Preliminary Investigations into a Microkernel OSAL for cFSflightsoftware.jhuapl.edu/files/2017/cFS-Day/cFS17-B2-Presentation... · Preliminary Investigations into a Microkernel OSAL

SW Bus TablesMutex

HK

...

...CSS

Mission-specificapplications

General utilityapplications

Core FlightExecutive Functions

OperatingSystemAbstractionLayer

Page 16: Preliminary Investigations into a Microkernel OSAL for cFSflightsoftware.jhuapl.edu/files/2017/cFS-Day/cFS17-B2-Presentation... · Preliminary Investigations into a Microkernel OSAL

Core Flight System

SW Bus TablesMutex

HK

...

...CSS

Sched FSLoader Net PSP

Mission-specificapplications

General utilityapplications

Core FlightExecutive Functions

OperatingSystemAbstractionLayer

Page 17: Preliminary Investigations into a Microkernel OSAL for cFSflightsoftware.jhuapl.edu/files/2017/cFS-Day/cFS17-B2-Presentation... · Preliminary Investigations into a Microkernel OSAL

Composite μ-kernel

Small kernel (~7K LoC), real-time focus● Focused on IPC between protection domains

Export policies to user-level components● Scheduling, dev. drivers, memory mgmt, FS, ...

NIC Scheduler

Kernel

User-level

Interrupt vectoring Memory mapping Sync/async IPC

Page 18: Preliminary Investigations into a Microkernel OSAL for cFSflightsoftware.jhuapl.edu/files/2017/cFS-Day/cFS17-B2-Presentation... · Preliminary Investigations into a Microkernel OSAL

Composite OSAL/PSP

SW Bus TablesMutex

HK

...

...CSS

Sched FSLoader Net PSP

Mission-specificapplications

General utilityapplications

Core FlightExecutive Functions

OperatingSystemAbstractionLayer

Page 19: Preliminary Investigations into a Microkernel OSAL for cFSflightsoftware.jhuapl.edu/files/2017/cFS-Day/cFS17-B2-Presentation... · Preliminary Investigations into a Microkernel OSAL

Composite OSAL/PSP

● Communication explicitly controlled by design● IPC and scheduling are fast:

Composite Linux

2-way IPC 700 cycles 600 (syscall), 3500 (pipes)

Thd Dispatch 300 cycles 1800 (yield)

SW BusMutexTables

HK S CS SchedLoad

FSDriver

NetNIC

Page 20: Preliminary Investigations into a Microkernel OSAL for cFSflightsoftware.jhuapl.edu/files/2017/cFS-Day/cFS17-B2-Presentation... · Preliminary Investigations into a Microkernel OSAL

Composite OSAL/PSP – Current

● Fixed priority preemptive scheduling● RAM-based FS● Application loader:

– Into shared protection domain– Into separate protection domains

SW Bus, Mutex, TablesHK S CS

Sched/load/FS/net

Page 21: Preliminary Investigations into a Microkernel OSAL for cFSflightsoftware.jhuapl.edu/files/2017/cFS-Day/cFS17-B2-Presentation... · Preliminary Investigations into a Microkernel OSAL

Composite OSAL/PSP – Current

● Lines of C Code: < 4000 LoC● OSAL unit tests: > 89% successful

– oscore/osfile/osfilesys/osloader, 15% not relevant (OS call failure)

● In progress:– serialization/deserialization of OSAL arguments– increasing application support

SW Bus, Mutex, TablesHK S CS

Sched/load/FS/net

Page 22: Preliminary Investigations into a Microkernel OSAL for cFSflightsoftware.jhuapl.edu/files/2017/cFS-Day/cFS17-B2-Presentation... · Preliminary Investigations into a Microkernel OSAL

Aspects of SW Fault Tolerance

Remediation How do we return system to a well-defined state

Propagation How do we contain the scope of the fault

Detection Determine when system is in an erroneous state

Page 23: Preliminary Investigations into a Microkernel OSAL for cFSflightsoftware.jhuapl.edu/files/2017/cFS-Day/cFS17-B2-Presentation... · Preliminary Investigations into a Microkernel OSAL

Watchdog Timer

● Applications – periodically declare successful execution

● Every watchdog timer (1-10 seconds):– Have all applications and system components

checked in?– No: reboot!

Detection Remediation

RebootWatchdog Timer

Page 24: Preliminary Investigations into a Microkernel OSAL for cFSflightsoftware.jhuapl.edu/files/2017/cFS-Day/cFS17-B2-Presentation... · Preliminary Investigations into a Microkernel OSAL

Redundant Execution

Detection Remediation

Double M. RedundancyTriple M. Redundancy

SWBus

TablesMutex

HK

...

...CSS

Sched FSLoad Net PSP

SWBus

TablesMutex

HK

...

...CSS

Sched FSLoad Net PSP

Voter

Page 25: Preliminary Investigations into a Microkernel OSAL for cFSflightsoftware.jhuapl.edu/files/2017/cFS-Day/cFS17-B2-Presentation... · Preliminary Investigations into a Microkernel OSAL

SWBus

TablesMutex ...

HK ...CSS

SWBus

TablesMutex

HK

...

...CSS

Sched FSLoad Net PSP

Redundant Execution

Detection Remediation

Double M. RedundancyTriple M. Redundancy

SWBus

TablesMutex

HK

...

...CSS

Sched FSLoad Net PSP

SWBus

TablesMutex

HK

...

...CSS

Sched FSLoad Net PSP

Voter

Voter

SWBus

TablesMutex ...

HK ...CSS

Sched FSLoad Net PSP

SWBus

TablesMutex

HK

...

...CSS

Sched FSLoad Net PSP

SWBus

TablesMutex

HK

...

...CSS

Sched FSLoad Net PSP

Voter

Voter

Page 26: Preliminary Investigations into a Microkernel OSAL for cFSflightsoftware.jhuapl.edu/files/2017/cFS-Day/cFS17-B2-Presentation... · Preliminary Investigations into a Microkernel OSAL

SWBus

TablesMutex ...

HK ...CSS

SWBus

TablesMutex

HK

...

...CSS

Sched FSLoad Net PSP

Redundant Execution

Detection Remediation

Double M. RedundancyTriple M. Redundancy

SWBus

TablesMutex

HK

...

...CSS

Sched FSLoad Net PSP

SWBus

TablesMutex

HK

...

...CSS

Sched FSLoad Net PSP

Voter

Voter

SWBus

TablesMutex ...

HK ...CSS

Sched FSLoad Net PSP

SWBus

TablesMutex

HK

...

...CSS

Sched FSLoad Net PSP

SWBus

TablesMutex

HK

...

...CSS

Sched FSLoad Net PSP

Voter

Voter

Composite Voter (in-progress)● < 800 LoC in Rust● Utilize high-performance IPC + scheduling● Design: minimize...

...memory footprint

...CPU footprint

Page 27: Preliminary Investigations into a Microkernel OSAL for cFSflightsoftware.jhuapl.edu/files/2017/cFS-Day/cFS17-B2-Presentation... · Preliminary Investigations into a Microkernel OSAL

Checkpoint/Restore

Detection Remediation

Checkpoint/Restore

SWBus

TablesMutex

HK

...

...CSS

Sched FSLoad Net PSP

SWBus

TablesMutex

HK

...

...CSS

Sched FSLoad NetPSP

time

Checkpoint

SWBus

TablesMutex

HK

...

...CSS

Sched FSLoad NetPSP

Checkpoint

Page 28: Preliminary Investigations into a Microkernel OSAL for cFSflightsoftware.jhuapl.edu/files/2017/cFS-Day/cFS17-B2-Presentation... · Preliminary Investigations into a Microkernel OSAL

Checkpoint/Restore

Detection Remediation

Checkpoint/Restore

SWBus

TablesMutex

HK

...

...CSS

Sched FSLoad Net PSP

SWBus

TablesMutex

HK

...

...CSS

Sched FSLoad Net PSP

SWBus

TablesMutex

HK

...

...CSS

Sched FSLoad NetPSP

time

Checkpoint

SWBus

TablesMutex

HK

...

...CSS

Sched FSLoad NetPSP

Checkpoint

Page 29: Preliminary Investigations into a Microkernel OSAL for cFSflightsoftware.jhuapl.edu/files/2017/cFS-Day/cFS17-B2-Presentation... · Preliminary Investigations into a Microkernel OSAL

Checkpoint/Restore

Detection Remediation

Checkpoint/Restore

SWBus

TablesMutex

HK

...

...CSS

Sched FSLoad Net PSP

SWBus

TablesMutex

HK

...

...CSS

Sched FSLoad Net PSP

SWBus

TablesMutex

HK

...

...CSS

Sched FSLoad Net PSP

SWBus

TablesMutex

HK

...

...CSS

Sched FSLoad NetPSP

time

Checkpoint

SWBus

TablesMutex

HK

...

...CSS

Sched FSLoad NetPSP

Checkpoint Restore

Page 30: Preliminary Investigations into a Microkernel OSAL for cFSflightsoftware.jhuapl.edu/files/2017/cFS-Day/cFS17-B2-Presentation... · Preliminary Investigations into a Microkernel OSAL

Checkpoint/Restore

Detection Remediation

Checkpoint/Restore

SWBus

TablesMutex

HK

...

...CSS

Sched FSLoad Net PSP

SWBus

TablesMutex

HK

...

...CSS

Sched FSLoad Net PSP

SWBus

TablesMutex

HK

...

...CSS

Sched FSLoad Net PSP

SWBus

TablesMutex

HK

...

...CSS

Sched FSLoad NetPSP

time

Checkpoint

SWBus

TablesMutex

HK

...

...CSS

Sched FSLoad NetPSP

Checkpoint Restore

Composite Checkpoint/Restore

* 1MB + 512 MB

Composite* Linux/CRIU* Xen+

Checkpoint 0.2 ms 800ms 8s

Restore 0.2 ms 500ms 10s

Increases at rate of memcpy

Page 31: Preliminary Investigations into a Microkernel OSAL for cFSflightsoftware.jhuapl.edu/files/2017/cFS-Day/cFS17-B2-Presentation... · Preliminary Investigations into a Microkernel OSAL

Computational Crash Cart

Recover system-level components upon failure● Record summary of component comms● Reboot component + re-estabilish state

Focus on real-time● 10s of micro-second recovery time

Complementary to application-level reliability● Checkpoint/Redundant execution

Detection Remediation

Computational Crash Cart

Page 32: Preliminary Investigations into a Microkernel OSAL for cFSflightsoftware.jhuapl.edu/files/2017/cFS-Day/cFS17-B2-Presentation... · Preliminary Investigations into a Microkernel OSAL

Monitoring for Detection

Monitor/log system interactions and timing● API calls, context switches, interrupts, …

Process log● Interactions deviate from system model?● Interactions statistically deviate from

historically correct behaviors?

Detection Remediation

Monitoring + MLComposite Monitoring

Page 33: Preliminary Investigations into a Microkernel OSAL for cFSflightsoftware.jhuapl.edu/files/2017/cFS-Day/cFS17-B2-Presentation... · Preliminary Investigations into a Microkernel OSAL

How can we effectively use the parallelism of commodity CPUs?

Page 34: Preliminary Investigations into a Microkernel OSAL for cFSflightsoftware.jhuapl.edu/files/2017/cFS-Day/cFS17-B2-Presentation... · Preliminary Investigations into a Microkernel OSAL

Composite + Parallelism

Kernel designed to be lock-less● Kernel operations are all wait-free real-time→● IPC core-local, or inter-core

HK S CS FSDriver

NetNIC

SW BusMutexTables

Page 35: Preliminary Investigations into a Microkernel OSAL for cFSflightsoftware.jhuapl.edu/files/2017/cFS-Day/cFS17-B2-Presentation... · Preliminary Investigations into a Microkernel OSAL

Composite + Parallelism

Kernel designed to be lock-less● Kernel operations are all wait-free real-time→● IPC core-local, or inter-core

HK S CS FSDriver

NetNIC

SW BusMutexTables

Composite parallelism orchestration

Example: OpenMP fork/join parallelism● 2-40x decrease in worst-case inter-core

communication latencies● Up to 40 cores, across 4 sockets

Page 36: Preliminary Investigations into a Microkernel OSAL for cFSflightsoftware.jhuapl.edu/files/2017/cFS-Day/cFS17-B2-Presentation... · Preliminary Investigations into a Microkernel OSAL

CubeSats: Fresh View on cFE

How can we effectively● utilize new resource availability, and ● provide SW fault tolerance?

→ Composite OSAL enables new options

Where do we go from here?● Need your feedback...

Page 37: Preliminary Investigations into a Microkernel OSAL for cFSflightsoftware.jhuapl.edu/files/2017/cFS-Day/cFS17-B2-Presentation... · Preliminary Investigations into a Microkernel OSAL

? || /* */

Page 38: Preliminary Investigations into a Microkernel OSAL for cFSflightsoftware.jhuapl.edu/files/2017/cFS-Day/cFS17-B2-Presentation... · Preliminary Investigations into a Microkernel OSAL

● Computational Crash Cart, Checkpointing:http://www2.seas.gwu.edu/~gparmer/publications/rtss13_c3.pdf

● Model-based event monitoring:http://www2.seas.gwu.edu/~gparmer/publications/rtas15cmon_extended.pdf

● ML-based event monitoring:http://www2.seas.gwu.edu/~gparmer/publications/certs16caml.pdf

● Micro-benchmarks and virtualization:http://www2.seas.gwu.edu/~gparmer/publications/rtss17tcaps.pdf

● Lock-free, predictable kernel:http://www2.seas.gwu.edu/~gparmer/publications/rtas15speck.pdf

● OpenMP Fork/Join parallelism:http://www2.seas.gwu.edu/~gparmer/publications/rtas14_fjos.pdf