RebootlessKernel Updates - University of Washington€¦ · •Patching modules that are not yet...
Transcript of RebootlessKernel Updates - University of Washington€¦ · •Patching modules that are not yet...
![Page 1: RebootlessKernel Updates - University of Washington€¦ · •Patching modules that are not yet loaded ... •CRIU –Checkpoint Restore In Userspace •Approach: •Similar to hibernation,](https://reader034.fdocuments.us/reader034/viewer/2022051912/60036ccf751c226ebb2e8ff4/html5/thumbnails/1.jpg)
Rebootless Kernel UpdatesSrivatsa S. Bhat
University of Washington3 Dec 2018
![Page 2: RebootlessKernel Updates - University of Washington€¦ · •Patching modules that are not yet loaded ... •CRIU –Checkpoint Restore In Userspace •Approach: •Similar to hibernation,](https://reader034.fdocuments.us/reader034/viewer/2022051912/60036ccf751c226ebb2e8ff4/html5/thumbnails/2.jpg)
Why are reboots undesirable?
![Page 3: RebootlessKernel Updates - University of Washington€¦ · •Patching modules that are not yet loaded ... •CRIU –Checkpoint Restore In Userspace •Approach: •Similar to hibernation,](https://reader034.fdocuments.us/reader034/viewer/2022051912/60036ccf751c226ebb2e8ff4/html5/thumbnails/3.jpg)
Why are reboots undesirable?Remember this? J
![Page 4: RebootlessKernel Updates - University of Washington€¦ · •Patching modules that are not yet loaded ... •CRIU –Checkpoint Restore In Userspace •Approach: •Similar to hibernation,](https://reader034.fdocuments.us/reader034/viewer/2022051912/60036ccf751c226ebb2e8ff4/html5/thumbnails/4.jpg)
Why are reboots undesirable?
![Page 5: RebootlessKernel Updates - University of Washington€¦ · •Patching modules that are not yet loaded ... •CRIU –Checkpoint Restore In Userspace •Approach: •Similar to hibernation,](https://reader034.fdocuments.us/reader034/viewer/2022051912/60036ccf751c226ebb2e8ff4/html5/thumbnails/5.jpg)
Why are reboots undesirable?• Downtime:• Shutdown + Boot + App startup
• Loss of state (eg: network connections)• Loss of results from long running processes• Unexpected complications
![Page 6: RebootlessKernel Updates - University of Washington€¦ · •Patching modules that are not yet loaded ... •CRIU –Checkpoint Restore In Userspace •Approach: •Similar to hibernation,](https://reader034.fdocuments.us/reader034/viewer/2022051912/60036ccf751c226ebb2e8ff4/html5/thumbnails/6.jpg)
Why do kernel updates need rebooting?
![Page 7: RebootlessKernel Updates - University of Washington€¦ · •Patching modules that are not yet loaded ... •CRIU –Checkpoint Restore In Userspace •Approach: •Similar to hibernation,](https://reader034.fdocuments.us/reader034/viewer/2022051912/60036ccf751c226ebb2e8ff4/html5/thumbnails/7.jpg)
Why do kernel updates need rebooting?• Kernel manages hardware• Driver updates may require re-init of hardware
• Userspace programs need kernel services• System calls, signals, IPC etc
![Page 8: RebootlessKernel Updates - University of Washington€¦ · •Patching modules that are not yet loaded ... •CRIU –Checkpoint Restore In Userspace •Approach: •Similar to hibernation,](https://reader034.fdocuments.us/reader034/viewer/2022051912/60036ccf751c226ebb2e8ff4/html5/thumbnails/8.jpg)
Why would you want live kernel updates?• Minimal service disruption• Apply security (CVE) fixes ASAP without scheduled
maintenance windows• Avoid application start-up times following OS updates
![Page 9: RebootlessKernel Updates - University of Washington€¦ · •Patching modules that are not yet loaded ... •CRIU –Checkpoint Restore In Userspace •Approach: •Similar to hibernation,](https://reader034.fdocuments.us/reader034/viewer/2022051912/60036ccf751c226ebb2e8ff4/html5/thumbnails/9.jpg)
Adding kernel code on the fly• Loadable kernel modules
![Page 10: RebootlessKernel Updates - University of Washington€¦ · •Patching modules that are not yet loaded ... •CRIU –Checkpoint Restore In Userspace •Approach: •Similar to hibernation,](https://reader034.fdocuments.us/reader034/viewer/2022051912/60036ccf751c226ebb2e8ff4/html5/thumbnails/10.jpg)
Live kernel updates wishlist• Ability to fix bugs/vulnerabilities in any part of the kernel
(both core + module code)• Small update latency (say, < 10 seconds)• Ability to rollback on update failure• Minimal programmer effort to tailor fixes to live update
scenarios
![Page 11: RebootlessKernel Updates - University of Washington€¦ · •Patching modules that are not yet loaded ... •CRIU –Checkpoint Restore In Userspace •Approach: •Similar to hibernation,](https://reader034.fdocuments.us/reader034/viewer/2022051912/60036ccf751c226ebb2e8ff4/html5/thumbnails/11.jpg)
Live update approaches for Linux• Ksplice (MIT/Oracle)• kGraft (SUSE)• Kpatch (RedHat)• Livepatch (Upstream) [ inspired by kGraft + kpatch ]
![Page 12: RebootlessKernel Updates - University of Washington€¦ · •Patching modules that are not yet loaded ... •CRIU –Checkpoint Restore In Userspace •Approach: •Similar to hibernation,](https://reader034.fdocuments.us/reader034/viewer/2022051912/60036ccf751c226ebb2e8ff4/html5/thumbnails/12.jpg)
Ksplice• Works at the level of object-code• Function-level code replacement• Latency: 0.7 milliseconds
• Workflow:• Generate binary replacement code using pre-post differencing• Resolve symbols and verify safety using run-pre matching• Use stop-machine for quiescence and perform code update
![Page 13: RebootlessKernel Updates - University of Washington€¦ · •Patching modules that are not yet loaded ... •CRIU –Checkpoint Restore In Userspace •Approach: •Similar to hibernation,](https://reader034.fdocuments.us/reader034/viewer/2022051912/60036ccf751c226ebb2e8ff4/html5/thumbnails/13.jpg)
Ksplice : Generating repl code using pre-post differencing
Source patchKernel’s source
code
Post obj files Pre obj filesbinary diff
List of functions that differExtract functions that differed
Post code functions that differed
Processed post objfile
Generic kernel
module
LinkerPrimary module
![Page 14: RebootlessKernel Updates - University of Washington€¦ · •Patching modules that are not yet loaded ... •CRIU –Checkpoint Restore In Userspace •Approach: •Similar to hibernation,](https://reader034.fdocuments.us/reader034/viewer/2022051912/60036ccf751c226ebb2e8ff4/html5/thumbnails/14.jpg)
stop-machine framework in Linux• Mechanism to run a given function on a given CPU with the
rest of the machine stopped!
![Page 15: RebootlessKernel Updates - University of Washington€¦ · •Patching modules that are not yet loaded ... •CRIU –Checkpoint Restore In Userspace •Approach: •Similar to hibernation,](https://reader034.fdocuments.us/reader034/viewer/2022051912/60036ccf751c226ebb2e8ff4/html5/thumbnails/15.jpg)
stop-machine framework in Linux• Mechanism to run a given function on a given CPU with the
rest of the machine stopped!
• “Stopper threads” created for each CPU during boot• Have highest priority in the system• Execute only in kernel mode• Typically in non-runnable state
![Page 16: RebootlessKernel Updates - University of Washington€¦ · •Patching modules that are not yet loaded ... •CRIU –Checkpoint Restore In Userspace •Approach: •Similar to hibernation,](https://reader034.fdocuments.us/reader034/viewer/2022051912/60036ccf751c226ebb2e8ff4/html5/thumbnails/16.jpg)
stop-machine framework in Linux• Mechanism to run a given function on a given CPU with the
rest of the machine stopped!
• “Stopper threads” created for each CPU during boot• Have highest priority in the system• Execute only in kernel mode• Typically in non-runnable state
• stop-machine flow:• Mark all per-CPU stopper threads as runnable• Each stopper thread preempts userspace and hogs the CPU• Interrupts disabled on each CPU
• Runs the requested function on the specified CPU
![Page 17: RebootlessKernel Updates - University of Washington€¦ · •Patching modules that are not yet loaded ... •CRIU –Checkpoint Restore In Userspace •Approach: •Similar to hibernation,](https://reader034.fdocuments.us/reader034/viewer/2022051912/60036ccf751c226ebb2e8ff4/html5/thumbnails/17.jpg)
kGraft• Replaces entire functions• Uses ftrace to perform code patching• Process by process transition to new kernel code:• Old vs New Universe• Band-Aid functions that understand both old and new
layouts of data-structures• Uses fake signals to force “slow” processes to transition
• Needs special care to deal with:• Kernel threads• Interrupt handlers
![Page 18: RebootlessKernel Updates - University of Washington€¦ · •Patching modules that are not yet loaded ... •CRIU –Checkpoint Restore In Userspace •Approach: •Similar to hibernation,](https://reader034.fdocuments.us/reader034/viewer/2022051912/60036ccf751c226ebb2e8ff4/html5/thumbnails/18.jpg)
kpatch• Similar to kGraft for the most part• A fundamental difference from kGraft:• Uses stop-machine for quiescence:• Examine kernel stacks of all processes with machine
stopped.• If function not on any stack, proceed to patch.• Can’t patch functions always found on the stack• Eg: schedule()
![Page 19: RebootlessKernel Updates - University of Washington€¦ · •Patching modules that are not yet loaded ... •CRIU –Checkpoint Restore In Userspace •Approach: •Similar to hibernation,](https://reader034.fdocuments.us/reader034/viewer/2022051912/60036ccf751c226ebb2e8ff4/html5/thumbnails/19.jpg)
Livepatch• Best of both kGraft and kpatch• Consistency model:• Supports both stop-machine and process-by-process
transition• Stack traces used to be unreliable• Assembly routines may not setup stack frames• Fixed by ORC unwinder + objtool (stack validation)
![Page 20: RebootlessKernel Updates - University of Washington€¦ · •Patching modules that are not yet loaded ... •CRIU –Checkpoint Restore In Userspace •Approach: •Similar to hibernation,](https://reader034.fdocuments.us/reader034/viewer/2022051912/60036ccf751c226ebb2e8ff4/html5/thumbnails/20.jpg)
Challenges for Livepatch / similar mechanisms• Data-structure / semantic changes• (Partially) solved using shadow data-structures
• Changes to initialization routines• Changes to static variables• Dealing with compiler optimizations• Patching hand-written assembly• Handling changes in locking rules• Patching modules that are not yet loaded• Patching patched kernels• Reverting live patches in case of failures• …• Undecidability: In the general case, can’t prove that patch +
state transition leads to valid state.
![Page 21: RebootlessKernel Updates - University of Washington€¦ · •Patching modules that are not yet loaded ... •CRIU –Checkpoint Restore In Userspace •Approach: •Similar to hibernation,](https://reader034.fdocuments.us/reader034/viewer/2022051912/60036ccf751c226ebb2e8ff4/html5/thumbnails/21.jpg)
“Seamless” kernel updates• Achieved via a combination of:• Kexec – exec a new kernel image from a running kernel• CRIU – Checkpoint Restore In Userspace
• Approach:• Similar to hibernation, but more generic• Checkpoint all userspace state using CRIU to disk• Kernel-version agnostic checkpointed state/format
• Kexec into new kernel• Restore all userspace from checkpointed image
![Page 22: RebootlessKernel Updates - University of Washington€¦ · •Patching modules that are not yet loaded ... •CRIU –Checkpoint Restore In Userspace •Approach: •Similar to hibernation,](https://reader034.fdocuments.us/reader034/viewer/2022051912/60036ccf751c226ebb2e8ff4/html5/thumbnails/22.jpg)
Kernel updates via kexec + CRIU• Latency improvements• Incremental checkpoints• On-demand restore• Persistent Physical Pages
![Page 23: RebootlessKernel Updates - University of Washington€¦ · •Patching modules that are not yet loaded ... •CRIU –Checkpoint Restore In Userspace •Approach: •Similar to hibernation,](https://reader034.fdocuments.us/reader034/viewer/2022051912/60036ccf751c226ebb2e8ff4/html5/thumbnails/23.jpg)
Kernel updates via kexec + CRIU• Demo• https://gts3.org/pages/kup.html
![Page 24: RebootlessKernel Updates - University of Washington€¦ · •Patching modules that are not yet loaded ... •CRIU –Checkpoint Restore In Userspace •Approach: •Similar to hibernation,](https://reader034.fdocuments.us/reader034/viewer/2022051912/60036ccf751c226ebb2e8ff4/html5/thumbnails/24.jpg)
PROTEOS• Assumes microkernel design (eg: Minix)• Performs process-level updates (unlike function-level updates)• State quiescence (unlike function quiescence)• State-transfer between old/new process versions• Uses LLVM link-time pass for instrumentation• Per-update state filters and interface filters• Strictly event-driven process loops• Structured design to handle many live update complications• Supports a wider range of OS updates automatically than
Livepatch-like approaches.• Updating the microkernel itself might be challenging.
![Page 25: RebootlessKernel Updates - University of Washington€¦ · •Patching modules that are not yet loaded ... •CRIU –Checkpoint Restore In Userspace •Approach: •Similar to hibernation,](https://reader034.fdocuments.us/reader034/viewer/2022051912/60036ccf751c226ebb2e8ff4/html5/thumbnails/25.jpg)
Revisiting the wishlist – Are we there yet?• Ability to fix bugs/vulnerabilities in any part of the kernel
(both core + module code)• Minimal update latency (say, < 10 seconds)• Ability to rollback on update failure• Minimal programmer effort to tailor fixes to live update
scenarios
![Page 26: RebootlessKernel Updates - University of Washington€¦ · •Patching modules that are not yet loaded ... •CRIU –Checkpoint Restore In Userspace •Approach: •Similar to hibernation,](https://reader034.fdocuments.us/reader034/viewer/2022051912/60036ccf751c226ebb2e8ff4/html5/thumbnails/26.jpg)
References• Ksplice : Automatic Rebootless Kernel Updates
https://pdos.csail.mit.edu/papers/ksplice:eurosys.pdf
• kGraft, kpatch and Livepatch:
• https://lwn.net/Articles/596854/
• https://lwn.net/Articles/597407/
• https://lwn.net/Articles/734765/
• Kexec + CRIU : Instant OS Updates via Userspace Checkpoint-and-Restart
https://www.usenix.org/system/files/conference/atc16/atc16_paper-
kashyap.pdf
• PROTEOS: Safe and Automatic Live Update for Operating Systems
https://www.cs.vu.nl/~giuffrida/papers/asplos-2013.pdf