C edric Tedeschi Distributed Chemically-Inspired Runtimes ...
Nautilus: A Case for Transforming Parallel Runtimes into ...
Transcript of Nautilus: A Case for Transforming Parallel Runtimes into ...
Nautilus: A Case for Transforming Parallel Runtimes into Operating System Kernels Kyle C. Hale and Peter Dinda | {k-hale,pdinda}@northwestern.edu
We present the Hybrid Runtime (HRT), a transformation of a traditional parallel runtime into a specialized operating system kernel, and the Hybrid Virtual Machine (HVM), which makes it possible to create VMs that are internally partitioned between a “regular OS” and an HRT HRTs enjoy unfettered access to the hardware and determine their own abstractions to that hardware. There are no longer hidden avenues that allow the OS to get in the way We introduce Nautilus, a minimal OS kernel framework for creating these HRTs. Nautilus runs both on commodity x86_64 hardware and on the Xeon Phi coprocessor We evaluate the Nautilus framework itself and describe two runtimes (NESL and Legion) that we transformed into HRTs and one (NDPC) that we built from scratch to support the HRT model
Introducing Hybrid Runtimes Nautilus Kernel Framework
• HRTs can be created very quickly (we transformed all three runtimes into HRTs in ~4 months) • HRTs do not have to suffer from OS noise and general non-determinism: they support only the
OS constructs that they need • With an HRT-based kernel framework like Nautilus, HRTs can start out with a simple,
minimal skeleton for the execution environment and complexity can be added by runtime developers
• HRTs do not suffer from the “feature creep” that handicaps many general-purpose OSes • More complex functionality, especially Linux compatibility, can be facilitated with the HVM,
as in the figure below
The power of HRTs and the HVM
[1] K.C. Hale, P. Dinda. A Case for Transforming Parallel Runtimes into Operating System Kernels. In Proceedings of
the 24th International ACM Symposium on High-Performance Parallel and Distributed Computing (HPDC 2015). [2] J. Lange, P. Dinda, K. Hale, L. Xia. An Introduction to the Palacios Virtual Machine Monitor Release 1.3. Tech.
Rep. NWU-EECS-11-10, Dept. of EECS, Northwestern Univ. (2011). [3] D. Engler and M. Kaashoek. Exterminate all Operating System Abstractions. In Proceedings of the 5th Workshop on
Hot Topics in Operating Systems (HotOS 1995)
References
This project is made possible by support from Sandia National Laboratories through the Hobbes Project, which is funded by the 2013 Exascale Operating and Runtime Systems Program under the Office of Advanced Scientific Computing Research in the DOE Office of Science. Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.
Acknowledgements
• Legion is a complicated parallel runtime that is gaining traction in the HPC community • Shares many of the same design goals that an OS does: we received several complaints from the
developers that OS abstractions were getting in the way of the hardware • We ported Legion to the HRT model on top of Nautilus, and with a very reasonable effort observed
promising results (below)
The Legion runtime as an HRT
• Nautilus is designed to be simple, fast, and extensible • Left: Nautilus comprises a very thin layer on top of the hardware: the HRT
has ultimate control over mechanism and policy • Nautilus is in many ways similar to a libOS and is heavily influenced by
exokernel Runtime
Paging Threads Bootstrap Timers
Hardware
IRQs Console Nautilus Topology Sync.
Parallel Application Kernel Mode
User Mode
HRT
Kernel
Events
Parallel&App&
Parallel&Run,-me&
General&Kernel&
Node&HW&
User%Mode%
Kernel%Mode%
Parallel&App&
Hybrid&Run,-me&(HRT)&
Node&HW&
User%Mode%
Kernel%Mode%
Parallel&Run,-me&
General&Kernel&
Node&HW&
User%Mode%
Kernel%Mode%
Parallel&App&
Hybrid&Run,-me&(HRT)&
User%Mode%
Kernel%Mode%
Hybrid&Virtual&Machine&(HVM)&
Specialized&Virtualiza-on&Model&
General&Virtualiza-on&Model&
Performan
ce*Path*
Parallel&App&
Legacy*Path*
(a) Current Model (b) Hybrid Run-time Model
(c) Hybrid Run-time Model Within a Hybrid Virtual Machine
Performan
ce*Path*
HRT structure with Nautilus
NESL and NDPC as HRTs
• We designed and built Nautilus, a kernel framework for creating hybrid runtimes (HRTs)
• We showed how HRTs can be extended to interact with a legacy environment using the hybrid virtual machine (HVM)
• We showed the feasibility and simplicity of creating HRTs using Nautilus, with three proofs-of-concept, one of which is a large and complex parallel runtime: Legion
• We showed that the HRT model can produce promising performance just by virtue of the HRT’s structure and with modifications that leverage the direct access to hardware
Summary
0
5000
10000
15000
20000
25000
30000
Linux N. MWAIT N. condvar N. w/kick
Cyc
les
Specialized event wakeup (not possible in userspace)extremely light-weight threads in Nautilus
thread creation in Linux and Nautilus
• Nautilus has a simple set of primitives that HRTs can adopt and extend for their own needs
• Primitives such as thread creation can be simpler given their intended use in specialized environments
• Nautilus has no notion of a process—only threads. The intent of this choice—and many others—is to expose a simple interface to HW and avoid the imposition of policy and cumbersome OS mechanisms
Language SLOC
C 22697
C++ 133
X86 Assembly
428
Scrip9ng 706
Language SLOC
C++ 133
C 636
Language SLOC
Compiler
Lisp 11005
Run2me
C 8853
lex 230
yacc 461
Language SLOC
Compiler
Perl 2000
yacc 236
lex 82
Run2me
C 2900
C++ 93
X86 Asembly 477
complexity of the Nautilus kernel framework
Item Cycles (and exits) Time
HRT core boot of Nau9lus to main()
~135 K (7 exits) 61 µs
Linux fork() ~320 K 145 µs
Linux exec() ~1 M 476 µs
Linux fork() + exec() ~1.5 M 714 µs
HRT core boot of Nautilus to idle thread
~37 M (~2300 exits) 17 ms
booting Nautilus in an HVM + HRT setup quicker than a Linux exec()
code added to Nautilus to support Legion, NDPC, and NESL
as HRTs
Left: by simply porting Legion to the HRT model and with very little optimization, we achieved parity with Linux—a heavily optimized and tuned OS
Right: to see how a very simple modification allowing direct HW access might affect performance, we turned off interrupts during the Legion task selection loop. The results show promise for exploring further, more significant optimizations tailored to direct hardware access and the HRT model
latency of event wakeups for several implementations
lines of code for NDPC and NESL, respectively, as HRTs
0
1000000
2000000
3000000
4000000
5000000
6000000
7000000
2 4 8 16 32 64
Cyc
les
Nautilus
Linux (pthreads)
• NESL is a nested data-parallel language that allows programs with nested parallelism to be flattened for operation on vector machines
• NDPC is an implementation of a subset of NESL that compiles to C++ and uses fork/join parallelism instead of NESL’s flattened parallelism
• As a proof-of-concept, we ported both NESL and NDPC to the HRT model with Nautilus in a very reasonable amount of time
0
1
2
3
4
5
6
1 2 4 8 16 32 64 128 200 220
Execu2
on Tim
e (s)
Legion Processor Count (Cores)
Natuilus
Linux
0%
5%
10%
15%
20%
25%
1 2 4 8 16 32 64 128 200 220
Spe
ed
up o
ver L
inux
Legion Processor Count (Cores)