Extending life for women with HER2-positive MBC Andreas Makris Mount Vernon Hospital Middlesex, UK.
April 29, 2006DynAMOS -- SMTPS '061 On-the-Fly Kernel Updates for High-Performance Computing...
-
Upload
charity-foster -
Category
Documents
-
view
215 -
download
0
Transcript of April 29, 2006DynAMOS -- SMTPS '061 On-the-Fly Kernel Updates for High-Performance Computing...
April 29, 2006 DynAMOS -- SMTPS '06 1
On-the-Fly Kernel Updates for High-Performance Computing Clusters
Kristis Makris <[email protected]>Arizona State University
Kyung Dong Ryu <[email protected]>IBM T.J. Watson Research Center
April 29, 2006 DynAMOS -- SMTPS '06 2
Motivation
Updating the kernel in HP clusters requires downtime Revenue loss in pay-per-use, time-sharing clusters Disruption of long-lived parallel tasks
Process migration may not be possible Postponing updates has its price
Unpatched kernel security holes Missed kernel specialization opportunities
Adaptive selection of kernel subsystem to use; Virtualization cannot help
Parallel computing needs Safe, unobtrusive updates (no system restart) Temporary, reversible specialization of some nodes Portable updating system (i386 + PowerPC)
April 29, 2006 DynAMOS -- SMTPS '06 3
Solution: Dynamic Kernel Updates
Approaches Adaptable OS
Specially crafted, like K42, VINO, Synthetix Require OS and application restructuring
Dynamic code instrumentation Zero kernel source modification (KernInst, GILK) Basic block code interposition Currently limited
• No procedure replacement• No autonomous kernel adaptability• No safe, complete subsystem update guarantees
April 29, 2006 DynAMOS -- SMTPS '06 4
Dynamic Updates Classification
Updating changes in Userspace requirements
Security fix breaks existing applications that rely on defect Kernel external requirements
Function signature changes (API changes) Kernel internal requirements
Global variables used by a function group (e.g. enlarge copy buffer used in pipefs)
Updating needs State tracking
Enlarge copy buffer only for 2 processes Must adaptively enlarge the buffer and use newer functions
State transfer Copy data from old buffer to new
April 29, 2006 DynAMOS -- SMTPS '06 5
Dynamic Update Types No safe update point
Update read-only global variable (e.g maximum number of open files)
Add new variable used only by a single function Safe update point
Update uid of an inode (guarded by a semaphore) Add new variable used by function group (must update atomically)
Non-quiescent resources Update kernel scheduler to use different policy.
Datatype updates Update functions that use the old datatype to use the new datatype Maintain shadow data structure that holds only new fields, and
update only functions that use the new fields
April 29, 2006 DynAMOS -- SMTPS '06 6
DynAMOS System Architecture
Distribute updates to cluster nodes Process updating requests from
control station with framework
Prepare updates to be applied Coordinate safe activation/removal
Currently implemented for i386 uniprocessor Linux kernels 2.2-2.6
April 29, 2006 DynAMOS -- SMTPS '06 7
Execution Flow Redirection (1)
Install trampoline in beginning of original function Disable local processor interrupts Flush I-cache
Use an indirect jump (jmp *) Don’t modify page permissions
Divert execution to a redirection handler
Original function can no longer be directly executed
April 29, 2006 DynAMOS -- SMTPS '06 8
Execution Flow Redirection (2)
Create separate redirection handler for each function Customize from template
Clone and relocate original function image
Choose between active function versions with adaptation handler
Can execute different versions of functions in different process contexts
April 29, 2006 DynAMOS -- SMTPS '06 9
Function Cloning Benefits
Unaltered stack when newer function is executed No processor state saved on stack
Autonomous kernel determination of update timeliness Using adaptation handler
Function-level instrumented applications Basic blocks can be bypassed Modifications developed in functions with original
source language
April 29, 2006 DynAMOS -- SMTPS '06 10
Function Relocation
Adjust relative branch instructionsReplace ret instructions with jumps back
to redirection handlerSafely detect
Backward branches: Point to code overwritten by trampoline
Outbound branches: Jump to code outside function image
April 29, 2006 DynAMOS -- SMTPS '06 11
Applying Security Patches
Openwall hardening changes for Linux 2.4.22 Permission check when writing in named pipes
Updated open_namei function No safe update point needed
Permission check when following a symbolic link Updated open_namei, vfs_link functions Had to update inline function do_follow_link,
used by link_path_walk No need to update functions atomically
Confirmed unauthorized access was denied
April 29, 2006 DynAMOS -- SMTPS '06 12
Applying Unobtrusive Fine-grained Cycle Stealing
Linger-Longer system for Linux 2.2.19 Introduces a guest priority New scheduling policy
Updated schedule function in 4-node clusterConfirmed guest processes were not
consuming CPU time when host processes were active
April 29, 2006 DynAMOS -- SMTPS '06 13
Applying Adaptive Memory Paging For Efficient Gang-Scheduling Various adaptive memory paging policies for Linux
2.2.19 for 4-node cluster Required modifications in kswapd, swap_out, rw_swap_page, swapin_readahead, filemap_nopage
kswapd is a kernel thread that never exits Beginning of function is never called again Thread sleeps by calling interruptible_sleep_on Insert interruptible_sleep_on_v2 forcing kswapd to exit Start kswapd_v2
Confirmed job switching time was reduced
April 29, 2006 DynAMOS -- SMTPS '06 14
Overhead
29k footprint < 1ns trampoline
installation time 20 ns redirection handler
overhead 2.3 secs update on 2Ghz
P4 (adaptive paging) 1-8% overhead (due to
indirect jump)
April 29, 2006 DynAMOS -- SMTPS '06 15
Related Work
Cluster Management Systems Do not support dynamic kernel updates
K42 Specially designed with hot-swappable capabilities Requires quiescence for all updates
Hicks’ system User-level software updates; requires recompilation
KernInst, GILK, ATOM, EEL Do not facilitate adaptive execution Do not replace complete subsystems
April 29, 2006 DynAMOS -- SMTPS '06 16
On-going and Additional Work Ensure safe update reversal
Confirm quiescence in stack and program counter Update datatypes
Maintain shadow data structure of new fields Apply EPCKPT kernel-assisted checkpointing Adaptively enlarge pipefs buffer Apply Superpages support Apply Scalable TCP for highspeed WANs Automatically produce updates given a patch file
Apply MOSIX Upgrade Linux kernel
April 29, 2006 DynAMOS -- SMTPS '06 17
Conclusion
Dynamic Kernel Updates Dynamic code instrumentation Commodity operating system Function cloning for adaptive execution
Multiple function versions can run concurrently Safe updates of non-quiescent subsystems
Scheduler, kernel threads Demonstrated updates
Adaptive memory paging for efficient gang-scheduling Unobtrusive fine-grain cycle stealing Public security fixes
Small memory footprint, 1-8% overhead
April 29, 2006 DynAMOS -- SMTPS '06 18
Questions ?