Post on 16-Oct-2021
© Hitachi, Ltd. 2014. All rights reserved.
Kpatch Without Stop MachineThe Next Step of Kernel Live Patching
Masami Hiramatsu<masami.hiramatsu.pt@hitachi.com>Linux Technology Research CenterYokohama Research Lab. Hitachi Ltd.,
LinuxCon NA 2014(Aug. 2014)
1
© Hitachi, Ltd. 2014. All rights reserved.
Speaker
• Masami Hiramatsu– A researcher, working for Hitachi
• Researching many RAS features
– A linux kprobes-related maintainer• Ftrace dynamic kernel event (a.k.a. kprobe-tracer)• Perf probe (a tool to set up the dynamic events)• X86 instruction decoder (in kernel)
2
© Hitachi, Ltd. 2014. All rights reserved.
Agenda: Kpatch Without Stop Machine
Background Story
Kpatch internal
Kpatch without stop_machine
Conclusion and Discussion
Note: this presentation is only focusing on the kernel-module side of kpatch.
More generic design and implementation, please attend to -
kpatch: Have Your Security And Eat It Too! – Josh Poimboeuf
Aug. 22. pm2:30
3
© Hitachi, Ltd. 2014. All rights reserved.
Agenda: Kpatch Without Stop Machine
Background Story What is kpatch?
Live patching requirements
Major updates and Minor updates
Kpatch internal Kpatch Overview
Active Safeness check
Stop_machine
Kpatch without stop_machine Live Patcing Rules
Kpatch Reference Counter
Safeness check without stop_machine
Conclusion and Discussion
4
© Hitachi, Ltd. 2014. All rights reserved.
What’s Kpatch?
• Kpatch is a LIVE patching function for kernel– This applys a binary patch to kernel on-line– Patching is done without shutdown
• Only for a small and critical issues– Not for major kernel update
5
© Hitachi, Ltd. 2014. All rights reserved.
Why Live Patching Required?
• Live patching is important for appliances for mission critical systems– Some embedded appliances are hard to maintain
frequently• Those are distributed widely in country side• Not in the big data center!
– Some appliances can’t accept 10ms downtime• Factory control system etc.
6
© Hitachi, Ltd. 2014. All rights reserved.
But for Major Updates?
• M.C. systems have periodic maintenance– Major fixes can be applied and rebooted– In between the maintenance, live patching will be used
• Live patching and major update are complementeach other– Live patching temporarily fixes small critical incidents– Major update permanently fixes all bugs
7
Planned Maintenance
Major UpdateLive Patching
criticalminor
© Hitachi, Ltd. 2014. All rights reserved.
The History of Live patching on Linux
• Live patching is not new
8
Pannus Live Patch (2004-2006)http://pannus.sourceforge.net/
Livepatch (2005)http://ukai.jp/Slides/2005/1202-b2con/mop/livepatch.html
ksplice (2007-)https://www.ksplice.com/
kGraft (2014)
kpatch (2014)
2004
For application
For kernel
2014
Use ftrace to replace functionWill be supported by major distributors
Build from scratchAcquired and supported by Oracle
Developed for CGLNo distribution support
Ptrace and mmap based one
© Hitachi, Ltd. 2014. All rights reserved.
Agenda: Kpatch Without Stop Machine
Background Story What is kpatch?
Live patching requirements
Major updates and Minor updates
Kpatch internal Kpatch Overview
Active Safeness check
Stop_machine
Kpatch without stop_machine Live Patcing Rules
Kpatch Reference Counter
Safeness check without stop_machine
Conclusion and Discussion
9
© Hitachi, Ltd. 2014. All rights reserved.
Kpatch: Overview
• Kpatch has 2 components– Kpatch build: Build a binary patch module– Kpatch.ko: The kernel module of Kpatch
10
Originalsrc
Patchedsrc
Patch
Build Currentvmlinux
Build Patchedvmlinux
Per-functiondiff
Patch module
A module whichcontains new
functions
Done by Kpatch build
Runninglinux
Kpatch.ko
Done by kpatch.ko
Do patchNew functions
On old functions
Today’s talk
© Hitachi, Ltd. 2014. All rights reserved.
Kpatch: How to Patch
• Kpatch uses Ftrace to patch– Hook the target function entry with registers– Change regs->ip to new function (change the flow)
11
Call foo()Call foo()
Call fentryCall fentrySave RegsSave Regs
Call ftrace_opsCall ftrace_ops Get new IP fromHash table
Get new IP fromHash table
Restore RegsRestore RegsChange regs->ipChange regs->ip
ReturnReturn
……
ReturnReturn
……
foo()
New foo()Ret to regs->ipRet to regs->ip
……
Ftrace hooksFoo() entry
Find the new IP from hashtable
Ftrace returnsTo new foo()
ReturnReturn
Foo() is called evenAfter patching.
Function pointer isAvailable
ftrace
kpatch.ko
© Hitachi, Ltd. 2014. All rights reserved.
Conflict of Old and New Functions
• Kpatch will update the execution path of a function– Q: What happen if the patched function is under
executed?– A: Old and new functions are executed at the
same time
!!This should not happen!!
• Kpatch ensures the old functions are not executed when patching– “Active Safeness Check”
12
© Hitachi, Ltd. 2014. All rights reserved.
Active Safeness Check
• Executing functions are on the stack• And IP register points current function too
• Active Safeness Check– Do stack dump to check the target functions are
not executed, for each thread.– Need to be done when the process is stopped.
– stop_machine is used13
Func1Func1
Func2Func2
Func3Func3 Func1+XX
Func2+YY
Stack at here
Func1+XX
Stack at here
© Hitachi, Ltd. 2014. All rights reserved.
Active Safeness Check With Stop_machine
• Kpatch uses stop_machine to check stacks
14
Time
PatchingProcess
Process1
Process2
Process3
AddFtrace entry
Call old func1
Call old func2
© Hitachi, Ltd. 2014. All rights reserved.
Active Safeness Check With Stop_machine
• Kpatch uses stop_machine to check stacks
15
Time
PatchingProcess
Process1
Process2
Process3
AddFtrace entry
Call old func1
Call old func2
All running processes and interrupts are stopped
Call Stop Machine
© Hitachi, Ltd. 2014. All rights reserved.
Active Safeness Check With Stop_machine
• Kpatch uses stop_machine to check stacks
16
Time
PatchingProcess
Process1
Process2
Process3
AddFtrace entry
SafenessCheck
Call old func1
Call old func2
Walk through the allThread and check
Old funcs on stacks
All running processes and interrupts are stopped
Call Stop Machine
© Hitachi, Ltd. 2014. All rights reserved.
Active Safeness Check With Stop_machine
• Kpatch uses stop_machine to check stacks
17
Time
PatchingProcess
Process1
Process2
Process3
AddFtrace entry
Update hashtable
SafenessCheck
Call old func1
Call old func2
Call New func2
Call New func2
Call New func1
Call New func1
Walk through the allThread and check
Old funcs on stacks
Now switch toNew functionsAll running processes and
interrupts are stopped
Call Stop Machine Return
© Hitachi, Ltd. 2014. All rights reserved.
Stop_machine: Pros and Cons
• Pros– Safe, simple and easy to review, Good for the 1st
version
• Cons– Stop_machine stops all processes a while
• It is critical for control/network appliances
– In virtual environment, this takes longer time• We need to wait all VCPUs are scheduled on the host
machine
18
© Hitachi, Ltd. 2014. All rights reserved.
Agenda: Kpatch Without Stop Machine
Background Story What is kpatch?
Live patching requirements
Major updates and Minor updates
Kpatch internal Kpatch and Ftrace
Kpatch and Safeness check
Stop_machine
Kpatch without stop_machine Live Patching Rules
Kpatch Reference Counter
Safeness check without stop_machine
Conclusion and Discussion
19
© Hitachi, Ltd. 2014. All rights reserved.
Live Patching Rules
• Live patching must follow the rules1. All the new functions in a patch must be applied
at once● We need an atomic operation
2. After switching new function, the old function must not be executed● We have to ensure no threads runs on old
functions● And no threads sleeps on them
20
© Hitachi, Ltd. 2014. All rights reserved.
Two actions for the solution
1. Introduce an atomic reference counter
2. Active safeness check at the context switch
21
© Hitachi, Ltd. 2014. All rights reserved.
Atomic Function Reference Counter
1. Introduce an atomic reference counter– Without stop_machine, functions can be called
while patching• Ensure no one actually runs functions -> refcounter• Increment the refcounter at entry• Decrement the refcounter at exit
– If refcounter is 0, update ALL function paths• We are sure there is no users
22
© Hitachi, Ltd. 2014. All rights reserved.
Kpatch Reference Counter
• Patching(switching) controlled by refcountTime
PatchingProcess
Process1
Process2
Process3
AddFtrace entry+1
Start patcingRefcnt = 1
© Hitachi, Ltd. 2014. All rights reserved.
Kpatch Reference Counter
• Patching(switching) controlled by refcountTime
PatchingProcess
Process1
Process2
Process3
AddFtrace entry
Prepare newHash table
SafenessCheck
Call old func1
Call old func2
Call old func1
+1 +1
+1
-1-1
-1
+1
Start patcingRefcnt = 1
Each function callInc/dec refcnt
While patching
© Hitachi, Ltd. 2014. All rights reserved.
Kpatch Reference Counter
• Patching(switching) controlled by refcount
25
Time
PatchingProcess
Process1
Process2
Process3
AddFtrace entry
Prepare newHash table
SafenessCheck
Call old func1
Call old func2
Call old func3
Call old func3
Call old func1
+1 +1
+1
+1
+1
-1
-1-1
-1
-1
+1
Patching processIs over, but refcnt
Is not 0
Start patcingRefcnt = 1
Each function callInc/dec refcntWhile patching
© Hitachi, Ltd. 2014. All rights reserved.
Kpatch Reference Counter
• Patching(switching) controlled by refcount
26
Time
PatchingProcess
Process1
Process2
Process3
AddFtrace entry
Prepare newHash table
SafenessCheck
Call old func1
Call old func2
Call old func3
Call old func3
Call old func1
Call New func2
Call New func2
Call New func1
Call New func1
+1 +1
+1
+1
+1
-1
-1-1 -1
-1
-1
+1
Patching processIs over, but refcnt
Is not 0
Refcnt is 0Patch enabled on
all funcs
Start patcingRefcnt = 1
Don’t countrefcnt anymore
Each function callInc/dec refcntWhile patching
© Hitachi, Ltd. 2014. All rights reserved.
Kpatch Reference Counter (cont.)
• Control the reference counter– Need to stop counting before and after patching– Use atomic_inc_not_zero/dec_if_positive
• These are stopped automatically if counter == 0
27
func call
00
Do not hook functionEntry/exit before patching
refcnt
© Hitachi, Ltd. 2014. All rights reserved.
Kpatch Reference Counter (cont.)
• Control the reference counter– Need to stop counting before and after patching– Use atomic_inc_not_zero/dec_if_positive
• These are stopped automatically if counter == 0
28
1 2 1 0
atomic_dec_if_positiveatomic_inc_not_zerofunc call
atomic_incforcibly inc refcnt Patching
atomic_dec
func call
00refcnt
Do not hook functionEntry/exit before patching
© Hitachi, Ltd. 2014. All rights reserved.
Kpatch Reference Counter (cont.)
• Control the reference counter– Need to stop counting before and after patching– Use atomic_inc_not_zero/dec_if_positive
• These are stopped automatically if counter == 0
29
1 2 1 0 0 0
atomic_dec_if_positive
->do nothing
atomic_inc_not_zero-> do nothing
func call
atomic_dec_if_positiveatomic_inc_not_zerofunc call
atomic_incPatching
atomic_dec
func call
0
atomic_dec_if_positive
->do nothing
0refcnt
Do not hook functionEntry/exit before patching
© Hitachi, Ltd. 2014. All rights reserved.
Active Safeness check without stop_machine
2. Active safeness check at the context switch– To find threads sleeping(or going to sleep) on the
functions– For all running processes, hook the context
switch and check stack entries safely.– For the sleeping tasks, we can check it safely.
30
© Hitachi, Ltd. 2014. All rights reserved.
Safeness Check without Stop_machine
• 2 stages safeness checking
31
PatchingProcess
stack check if task is not running
rcu
lock
rcu unlo ck
check over all threads
List upRunningthreads
TimeStage1: Check sleeping tasks
Running process A
Running process B
wait on runqRun
Run
context switch
wait on runq
© Hitachi, Ltd. 2014. All rights reserved.
Safeness Check without Stop_machine
• 2 stages safeness checking
32
PatchingProcess
stack check if task is not running
rcu
lock
rcu unlo ck
check over all threadsrunning pid
running pidrunning pid
List upRunningthreads
hook
stack checkfor current process
Update
Wait for all runningtask is checked
Time
…
Stage1: Check sleeping tasks Stage2: Check running tasks
Running process A
Running process B
wait on runqRun
Run
context switch
wait on runq
© Hitachi, Ltd. 2014. All rights reserved.
How to add refcnt and context switch hook?
• To hook the function entry/return– Use kretprobe to hook it– For each function entry/return, inc/dec refcount
• To hook the context switch– Use kprobe to hook it– Do safeness check (on stack) and update running pid list
• Both are dynamic probe– After checking the safeness, all kretprobes/kprobes are r
emoved from the target functions– We have minimal overhead
33
© Hitachi, Ltd. 2014. All rights reserved.
Demo
• Demonstrating kpatching with/without stop_machine– Using ftrace to trace stop_machine()
34
(Setup ftrace)# echo stop_machine > /sys/kernel/debug/tracing/set_ftrace_filter# echo function_graph > /sys/kernel/debug/tracing/current_tracer
(Run the kpatch)# kpatch load kpatch-test-patch.ko
(Check the result)# echo 0 > /sys/kernel/debug/tracing/tracing_on# cat /sys/kernel/debug/tracing/trace
(Setup ftrace)# echo stop_machine > /sys/kernel/debug/tracing/set_ftrace_filter# echo function_graph > /sys/kernel/debug/tracing/current_tracer
(Run the kpatch)# kpatch load kpatch-test-patch.ko
(Check the result)# echo 0 > /sys/kernel/debug/tracing/tracing_on# cat /sys/kernel/debug/tracing/trace
© Hitachi, Ltd. 2014. All rights reserved.
Demo result
• With stop_machine
• Without stop_machine
35
cat trace# tracer: function_graph## CPU DURATION FUNCTION CALLS# | | | | | | | 0) ! 6410.455 us | stop_machine();
cat trace# tracer: function_graph## CPU DURATION FUNCTION CALLS# | | | | | | | 0) ! 6410.455 us | stop_machine();
cat trace# tracer: function_graph## CPU DURATION FUNCTION CALLS# | | | | | | |
cat trace# tracer: function_graph## CPU DURATION FUNCTION CALLS# | | | | | | |
No stop_machine is executed
© Hitachi, Ltd. 2014. All rights reserved.
Agenda: Kpatch Without Stop Machine
Background Story What is kpatch?
Live patching requirements
Major updates and Minor updates
Kpatch internal Kpatch Overview
Kpatch and Safeness check
Stop_machine
Kpatch without stop_machine Live Patcing Rules
Kpatch Reference Counter
Safeness check without stop_machine
Conclusion and Discussion
36
© Hitachi, Ltd. 2014. All rights reserved.
Conclusion
• Succeed to get rid of stop_machine() from kpatch– This is a proof of concept of stop_machine-free
kpatch• This means kpatch CAN BE ready for mission critical
systems• But still under discussion stage
• Upstreaming could be a long way– At first, push current stop_machine-based kpatch
to upstream– Stop_machine-free will be the next step
37
© Hitachi, Ltd. 2014. All rights reserved.
Current Issues
• Possible to miscount the function reference– Kretprobe has no error notification
• Kretprobe can be failed to handle the function return because of return-address buffer shortage
• Possible to fail patching with big patch– We have to monitor all the functions are safe in the patch– Big patch has many patched functions
• Some of them can be always used in the system• Incremental patching could be better
• Module unloading using stop_machine– This will happen if we replace old patch with new one– Incremental patching can avoid this.
38
© Hitachi, Ltd. 2014. All rights reserved.
Future work
• Use a generic return-hook mechanism– Kretprobe is for PoC, not for general use
• It can’t detect the failure of hooking
– Should be more safe (e.g. miss-hook handler)
• Context switch hook can be more general– Tracepoint/traceevent makes it better.– This requires kpatch as embedded feature
• Upstreaming
39
© Hitachi, Ltd. 2014. All rights reserved.
Resources and Links
• Get rid of the stop_machine from kpatch• https://github.com/dynup/kpatch/issues/138
• My no-stopmachine branch• https://github.com/mhiramathitachi/kpatch/tree/no-stop
machine-v1
– This requires IPMODIFY flag patchset for kernel• http://thread.gmane.org/gmane.linux.kernel/1757201
40
© Hitachi, Ltd. 2014. All rights reserved.
Questions?
© Hitachi, Ltd. 2014. All rights reserved.
Trademarks
43
• Linux is a trademark of Linus Torvalds in the United States, other countries, or both.
• Other company, product, or service names may be trademarks or service marks of others.