BPF CO-RE - Linux kernelvger.kernel.org/bpfconf2019_talks/bpf-core.pdf · •Accessing kernel...
Transcript of BPF CO-RE - Linux kernelvger.kernel.org/bpfconf2019_talks/bpf-core.pdf · •Accessing kernel...
![Page 1: BPF CO-RE - Linux kernelvger.kernel.org/bpfconf2019_talks/bpf-core.pdf · •Accessing kernel structs (e.g., task_structor sk_buff) •Memory layout changesbetween versions/configurations](https://reader033.fdocuments.us/reader033/viewer/2022042405/5f1eae534a733e5c397a203e/html5/thumbnails/1.jpg)
![Page 2: BPF CO-RE - Linux kernelvger.kernel.org/bpfconf2019_talks/bpf-core.pdf · •Accessing kernel structs (e.g., task_structor sk_buff) •Memory layout changesbetween versions/configurations](https://reader033.fdocuments.us/reader033/viewer/2022042405/5f1eae534a733e5c397a203e/html5/thumbnails/2.jpg)
BPF CO-RE (Compile Once – Run Everywhere)
Andrii Nakryiko
![Page 3: BPF CO-RE - Linux kernelvger.kernel.org/bpfconf2019_talks/bpf-core.pdf · •Accessing kernel structs (e.g., task_structor sk_buff) •Memory layout changesbetween versions/configurations](https://reader033.fdocuments.us/reader033/viewer/2022042405/5f1eae534a733e5c397a203e/html5/thumbnails/3.jpg)
Data center
Developing BPF application (today)
#include <linux/bpf.h>#include <linux/filter.h>int prog(struct __sk_buff* skb) {
if (skb->len < X) {return 1;
}...
}
bpf.c
embed
#include <bcc/BPF.h>
std::string BPF_PROGRAM =#include ”path/to/bpf.c”
namespace facebook {. . .
}
DrivingApp.cpp
compileApp package
LLVM/Clang
libbccDrivingApp
bpf.cdeploy
Development server
![Page 4: BPF CO-RE - Linux kernelvger.kernel.org/bpfconf2019_talks/bpf-core.pdf · •Accessing kernel structs (e.g., task_structor sk_buff) •Memory layout changesbetween versions/configurations](https://reader033.fdocuments.us/reader033/viewer/2022042405/5f1eae534a733e5c397a203e/html5/thumbnails/4.jpg)
Developing BPF application (today)
compileApp
System headers
linux/bpf.hlinux/filter.hlinux/shed.hlinux/fs.h...
bpf.c
#include <linux/bpf.h>#include <linux/filter.h>int prog(...) {
...}
ClangKernel
verify
bpf.o
Production server
![Page 5: BPF CO-RE - Linux kernelvger.kernel.org/bpfconf2019_talks/bpf-core.pdf · •Accessing kernel structs (e.g., task_structor sk_buff) •Memory layout changesbetween versions/configurations](https://reader033.fdocuments.us/reader033/viewer/2022042405/5f1eae534a733e5c397a203e/html5/thumbnails/5.jpg)
Problem:“On the fly” compilation
Developing BPF application (today)
![Page 6: BPF CO-RE - Linux kernelvger.kernel.org/bpfconf2019_talks/bpf-core.pdf · •Accessing kernel structs (e.g., task_structor sk_buff) •Memory layout changesbetween versions/configurations](https://reader033.fdocuments.us/reader033/viewer/2022042405/5f1eae534a733e5c397a203e/html5/thumbnails/6.jpg)
• Accessing kernel structs (e.g., task_struct or sk_buff)• Memory layout changes between versions/configurations• BPF code needs to be compiled w/ fixed offsets/sizes
“On the fly” BPF compilationWhy?
![Page 7: BPF CO-RE - Linux kernelvger.kernel.org/bpfconf2019_talks/bpf-core.pdf · •Accessing kernel structs (e.g., task_structor sk_buff) •Memory layout changesbetween versions/configurations](https://reader033.fdocuments.us/reader033/viewer/2022042405/5f1eae534a733e5c397a203e/html5/thumbnails/7.jpg)
1. Every prod machine needs kernel headers2. LLVM/Clang is big and heavy3. Testing is a pain
“On the fly” BPF compilationProblems
![Page 8: BPF CO-RE - Linux kernelvger.kernel.org/bpfconf2019_talks/bpf-core.pdf · •Accessing kernel structs (e.g., task_structor sk_buff) •Memory layout changesbetween versions/configurations](https://reader033.fdocuments.us/reader033/viewer/2022042405/5f1eae534a733e5c397a203e/html5/thumbnails/8.jpg)
Every prod machine needs kernel headers• kernel-devel package required• kernel-devel is missing internal headers• custom one-off kernels are a pain• kernel-devel can get out of sync
“On the fly” BPF compilationProblems
![Page 9: BPF CO-RE - Linux kernelvger.kernel.org/bpfconf2019_talks/bpf-core.pdf · •Accessing kernel structs (e.g., task_structor sk_buff) •Memory layout changesbetween versions/configurations](https://reader033.fdocuments.us/reader033/viewer/2022042405/5f1eae534a733e5c397a203e/html5/thumbnails/9.jpg)
LLVM/Clang is big and heavy• libbcc.so > 100MB• compilation is a heavy-weight process• can use lots of memory and CPU• on busy machine can tip over prod workload
“On the fly” BPF compilationProblems
![Page 10: BPF CO-RE - Linux kernelvger.kernel.org/bpfconf2019_talks/bpf-core.pdf · •Accessing kernel structs (e.g., task_structor sk_buff) •Memory layout changesbetween versions/configurations](https://reader033.fdocuments.us/reader033/viewer/2022042405/5f1eae534a733e5c397a203e/html5/thumbnails/10.jpg)
Testing is a pain• variety of kernel versions/configurations• “works on my machine” means nothing• Problem is detected only at run time
“On the fly” BPF compilationProblems
![Page 11: BPF CO-RE - Linux kernelvger.kernel.org/bpfconf2019_talks/bpf-core.pdf · •Accessing kernel structs (e.g., task_structor sk_buff) •Memory layout changesbetween versions/configurations](https://reader033.fdocuments.us/reader033/viewer/2022042405/5f1eae534a733e5c397a203e/html5/thumbnails/11.jpg)
Can we compile once?Then run same binary everywhere?
![Page 12: BPF CO-RE - Linux kernelvger.kernel.org/bpfconf2019_talks/bpf-core.pdf · •Accessing kernel structs (e.g., task_structor sk_buff) •Memory layout changesbetween versions/configurations](https://reader033.fdocuments.us/reader033/viewer/2022042405/5f1eae534a733e5c397a203e/html5/thumbnails/12.jpg)
• No kernel headers• No “on the fly” compilation• Upfront validation against prod kernels
BPF CO-RE (Compile Once – Run Everywhere)Goals
![Page 13: BPF CO-RE - Linux kernelvger.kernel.org/bpfconf2019_talks/bpf-core.pdf · •Accessing kernel structs (e.g., task_structor sk_buff) •Memory layout changesbetween versions/configurations](https://reader033.fdocuments.us/reader033/viewer/2022042405/5f1eae534a733e5c397a203e/html5/thumbnails/13.jpg)
BPF CO-RE flow
#include <vmlinux.h>#include <bpf_core.h>int prog(struct __sk_buff* skb) {
...}
bpf.c
compile
Data center
Kernel
BTFbpftool vmlinux.h
App package
libbpf
DrivingApp
bpf.o
Clang
bpf.o w/ relocs
package
deploy
Compile Development server
![Page 14: BPF CO-RE - Linux kernelvger.kernel.org/bpfconf2019_talks/bpf-core.pdf · •Accessing kernel structs (e.g., task_structor sk_buff) •Memory layout changesbetween versions/configurations](https://reader033.fdocuments.us/reader033/viewer/2022042405/5f1eae534a733e5c397a203e/html5/thumbnails/14.jpg)
BPF CO-RE flow
validate
bpftoolbpf.o w/ relocs
Test
Kernel 4.16
BTF
Kernel 4.18
BTF
Kernel 4.20
BTF
Kernel 5.0
BTF
Development server
![Page 15: BPF CO-RE - Linux kernelvger.kernel.org/bpfconf2019_talks/bpf-core.pdf · •Accessing kernel structs (e.g., task_structor sk_buff) •Memory layout changesbetween versions/configurations](https://reader033.fdocuments.us/reader033/viewer/2022042405/5f1eae534a733e5c397a203e/html5/thumbnails/15.jpg)
BPF CO-RE flow
App relocate
libbpf
Kernelbpf.o w/ relocs
load/verify
libbpfKernel
BTF
Run Production server
![Page 16: BPF CO-RE - Linux kernelvger.kernel.org/bpfconf2019_talks/bpf-core.pdf · •Accessing kernel structs (e.g., task_structor sk_buff) •Memory layout changesbetween versions/configurations](https://reader033.fdocuments.us/reader033/viewer/2022042405/5f1eae534a733e5c397a203e/html5/thumbnails/16.jpg)
• Self-describing kernel (BTF)• Clang w/ emitted relocations• Libbpf as relocating loader• Tooling for testing
BPF CO-REOverview
![Page 17: BPF CO-RE - Linux kernelvger.kernel.org/bpfconf2019_talks/bpf-core.pdf · •Accessing kernel structs (e.g., task_structor sk_buff) •Memory layout changesbetween versions/configurations](https://reader033.fdocuments.us/reader033/viewer/2022042405/5f1eae534a733e5c397a203e/html5/thumbnails/17.jpg)
• Deduplicated BTF information• compact (no need to strip it out: 2MB vs 177MB of DWARF) • describes all kernel types (size, layout, etc)• always in sync w/ kernel• lossless BTF to C conversion• Available today:• CONFIG_DEBUG_INFO_BTF=y (needs pahole >= v1.13)
BPF CO-RESelf-describing kernel
![Page 18: BPF CO-RE - Linux kernelvger.kernel.org/bpfconf2019_talks/bpf-core.pdf · •Accessing kernel structs (e.g., task_structor sk_buff) •Memory layout changesbetween versions/configurations](https://reader033.fdocuments.us/reader033/viewer/2022042405/5f1eae534a733e5c397a203e/html5/thumbnails/18.jpg)
• Struct layout changes• Version- / config-specific fields (logic in general)• #define macros• Unrelocatable sizeof()
BPF CO-RE Challenges
![Page 19: BPF CO-RE - Linux kernelvger.kernel.org/bpfconf2019_talks/bpf-core.pdf · •Accessing kernel structs (e.g., task_structor sk_buff) •Memory layout changesbetween versions/configurations](https://reader033.fdocuments.us/reader033/viewer/2022042405/5f1eae534a733e5c397a203e/html5/thumbnails/19.jpg)
#include <linux/sched.h>#include <linux/types.h>
int on_event(void* ctx) {struct task_struct *task;u64 read_bytes;task = (void *)bpf_get_current_task();
bpf_probe_read(&read_bytes, sizeof(u64), &task->ioac.read_bytes);
return 0;}
Field offset relocation 0: (85) call bpf_get_current_task1: (07) r0 += 19522: (bf) r1 = r103: (07) r1 += -84: (b7) r2 = 85: (bf) r3 = r06: (85) call bpf_probe_read7: (b7) r0 = 08: (95) exit
Field reloc:- insn: #1 - type: struct task_struct- accessor: 30:4
![Page 20: BPF CO-RE - Linux kernelvger.kernel.org/bpfconf2019_talks/bpf-core.pdf · •Accessing kernel structs (e.g., task_structor sk_buff) •Memory layout changesbetween versions/configurations](https://reader033.fdocuments.us/reader033/viewer/2022042405/5f1eae534a733e5c397a203e/html5/thumbnails/20.jpg)
• Struct layout changes• Kernel version- / config-specific logic• #define macros• Unrelocatable sizeof()
BPF CO-RE Challenges
![Page 21: BPF CO-RE - Linux kernelvger.kernel.org/bpfconf2019_talks/bpf-core.pdf · •Accessing kernel structs (e.g., task_structor sk_buff) •Memory layout changesbetween versions/configurations](https://reader033.fdocuments.us/reader033/viewer/2022042405/5f1eae534a733e5c397a203e/html5/thumbnails/21.jpg)
#include <linux/sched.h>#include <linux/types.h>/* relies on /proc/config.gz */extern bool CONFIG_IO_TASK_ACCOUNTING;
int on_event(void* ctx) {struct task_struct *task;u64 read_bytes;task = (void *)bpf_get_current_task();if (CONFIG_IO_TASK_ACCOUNTING) {
return bpf_probe_read(&read_bytes, sizeof(u64), &task->ioac.read_bytes);
}return 0;
}
Extern relocation 0: (85) call bpf_get_current_task1: (b7) r1 = XXX2: (15) if r1 == 0x0 goto pc+63: (07) r0 += 19524: (bf) r1 = r105: (07) r1 += -86: (b7) r2 = 87: (bf) r3 = r08: (85) call bpf_probe_read9: (b7) r0 = 010: (95) exit
Field reloc:- insn: #3 - type: struct task_struct- accessor: 30:4
Extern reloc:- insn: #1 - name: CONFIG_TASK_IO_ACCOUNTING- type: bool
![Page 22: BPF CO-RE - Linux kernelvger.kernel.org/bpfconf2019_talks/bpf-core.pdf · •Accessing kernel structs (e.g., task_structor sk_buff) •Memory layout changesbetween versions/configurations](https://reader033.fdocuments.us/reader033/viewer/2022042405/5f1eae534a733e5c397a203e/html5/thumbnails/22.jpg)
struct task_struct___custom {u64 experimental;
};
int on_event(void* ctx) {struct task_struct *task, *exp_task;u64 value = 0;task = (void *)bpf_get_current_task();
exp_task = (struct task_struct___custom *)task;bpf_probe_read(&value, sizeof(u64), &exp_task->experimental);
return 0;}
Uncommon/experimental fields
![Page 23: BPF CO-RE - Linux kernelvger.kernel.org/bpfconf2019_talks/bpf-core.pdf · •Accessing kernel structs (e.g., task_structor sk_buff) •Memory layout changesbetween versions/configurations](https://reader033.fdocuments.us/reader033/viewer/2022042405/5f1eae534a733e5c397a203e/html5/thumbnails/23.jpg)
• Struct layout changes• Kernel version- / config-specific logic• #define macros• Unrelocatable sizeof()
BPF CO-RE Challenges
![Page 24: BPF CO-RE - Linux kernelvger.kernel.org/bpfconf2019_talks/bpf-core.pdf · •Accessing kernel structs (e.g., task_structor sk_buff) •Memory layout changesbetween versions/configurations](https://reader033.fdocuments.us/reader033/viewer/2022042405/5f1eae534a733e5c397a203e/html5/thumbnails/24.jpg)
#define macros• Constants, flags, etc…• DWARF doesn’t record #defines, so doesn’t BTF• Copy/paste whatever you need?• bpf_core.h can provide commonly-needed stuff
![Page 25: BPF CO-RE - Linux kernelvger.kernel.org/bpfconf2019_talks/bpf-core.pdf · •Accessing kernel structs (e.g., task_structor sk_buff) •Memory layout changesbetween versions/configurations](https://reader033.fdocuments.us/reader033/viewer/2022042405/5f1eae534a733e5c397a203e/html5/thumbnails/25.jpg)
• Struct layout changes• Kernel version- / config-specific logic• #define macros• Unrelocatable sizeof()
BPF CO-RE Challenges
![Page 26: BPF CO-RE - Linux kernelvger.kernel.org/bpfconf2019_talks/bpf-core.pdf · •Accessing kernel structs (e.g., task_structor sk_buff) •Memory layout changesbetween versions/configurations](https://reader033.fdocuments.us/reader033/viewer/2022042405/5f1eae534a733e5c397a203e/html5/thumbnails/26.jpg)
struct task_struct *task;struct task_io_accounting io_acc;
task = (void *)bpf_get_current_task();
bpf_probe_read(&io_add, sizeof(struct task_io_accounting), &task->ioac);
// accessing fields on the stack is faster than // bpf_probe_read()’ing them individuallyio_acc.io_read_bytes;io_acc.io_write_bytes;io_acc.rchar;io_acc.wchar;
Unrelocatable sizeof()
Not relocatable
![Page 27: BPF CO-RE - Linux kernelvger.kernel.org/bpfconf2019_talks/bpf-core.pdf · •Accessing kernel structs (e.g., task_structor sk_buff) •Memory layout changesbetween versions/configurations](https://reader033.fdocuments.us/reader033/viewer/2022042405/5f1eae534a733e5c397a203e/html5/thumbnails/27.jpg)
struct task_struct *task;struct task_io_accounting io_acc;
task = (void *)bpf_get_current_task();
io_acc = __builtin_bpf_read_field(&task, ioac);
Unrelocatable sizeof()
Maybe relocatable?
Abstracts bitfield access?..
![Page 28: BPF CO-RE - Linux kernelvger.kernel.org/bpfconf2019_talks/bpf-core.pdf · •Accessing kernel structs (e.g., task_structor sk_buff) •Memory layout changesbetween versions/configurations](https://reader033.fdocuments.us/reader033/viewer/2022042405/5f1eae534a733e5c397a203e/html5/thumbnails/28.jpg)
Questions?
![Page 29: BPF CO-RE - Linux kernelvger.kernel.org/bpfconf2019_talks/bpf-core.pdf · •Accessing kernel structs (e.g., task_structor sk_buff) •Memory layout changesbetween versions/configurations](https://reader033.fdocuments.us/reader033/viewer/2022042405/5f1eae534a733e5c397a203e/html5/thumbnails/29.jpg)