An Exploration of Linux Container Network Monitoring and ...

61
Alban Crequy Exploration of Linux Container Network Monitoring and Visualization ContainerCon Europe - October 2016 https://goo.gl/iDL8te

Transcript of An Exploration of Linux Container Network Monitoring and ...

Page 1: An Exploration of Linux Container Network Monitoring and ...

Alban Crequy

Exploration of Linux Container Network Monitoring and

Visualization

ContainerCon Europe - October 2016https://goo.gl/iDL8te

Page 2: An Exploration of Linux Container Network Monitoring and ...

Alban Crequy

∘ Worked on the rkt container run-time∘ Contributed to systemd

https://github.com/alban

Page 3: An Exploration of Linux Container Network Monitoring and ...

Berlin-based software company building foundational Linux technologies

Some examples of what we work on...

OSTreegit for operating system binaries

Page 4: An Exploration of Linux Container Network Monitoring and ...

Find out more about us…

Blog: https://kinvolk.io/blog

Github: https://github.com/kinvolk

Twitter: https://twitter.com/kinvolkio

Email: [email protected]

Page 5: An Exploration of Linux Container Network Monitoring and ...

∘ First use-case: visualizing tcp connections∘ Microservices application with containers: Weave Socks∘ CoreOS Linux, Kubernetes, Weave Scope

∘ Using /proc & conntrack∘ Limitations∘ proc connector, eBPF & kprobes

∘ Next use cases:∘ L7, HTTP: eBPF & kprobes∘ Simulating degraded networks with traffic control

Plan

Page 6: An Exploration of Linux Container Network Monitoring and ...

The demo application

Page 7: An Exploration of Linux Container Network Monitoring and ...

microservices-demo

https://github.com/microservices-demo/microservices-demo

Page 8: An Exploration of Linux Container Network Monitoring and ...

Some micro-services

front-end Firefox

catalogue

ordersorders-db

payment

Page 9: An Exploration of Linux Container Network Monitoring and ...
Page 10: An Exploration of Linux Container Network Monitoring and ...

Orchestrating containersWith Kubernetes

Page 11: An Exploration of Linux Container Network Monitoring and ...

Kubernetes Replica Sets

Kubernetesnode 1

front-end

Kubernetesnode 2

front-end

Kubernetesnode 3

ordersorders

catalogue catalogue

Page 12: An Exploration of Linux Container Network Monitoring and ...

Kubernetesnode 1

front-end

Kubernetesnode 2

front-end

Kubernetesnode 3

ordersorders

Kubernetes Services

orders service

Page 13: An Exploration of Linux Container Network Monitoring and ...

Kubernetes ServicesProxying the traffic from the virtual service IP to a Kubernetes pod

Several implementations possible:

- Userspace proxy in kube-proxy- Iptables rules (Destination NAT) installed by kube-proxy- Cilium implements a load balancer based on eBPF (tc level)

Page 14: An Exploration of Linux Container Network Monitoring and ...

Weave Scope

Page 15: An Exploration of Linux Container Network Monitoring and ...

Weave Scope

Page 16: An Exploration of Linux Container Network Monitoring and ...

Weave Scope

demo

Page 17: An Exploration of Linux Container Network Monitoring and ...

procfs

Page 18: An Exploration of Linux Container Network Monitoring and ...

procfs files- /proc/$PID- /proc/$PID/ns/net network namespace- /proc/$PID/fd/ file descriptors- /proc/$PID/net/tcp tcp connections

Page 19: An Exploration of Linux Container Network Monitoring and ...

procfs files

Page 20: An Exploration of Linux Container Network Monitoring and ...

procfs limitations- No notifications- Need to read procfs for

- new processes- new network namespaces- new sockets- every second?

- CPU intensive for systems with high number of processes- Missing short-lived connections- Issues with packet modifications (e.g. DNAT)

Page 21: An Exploration of Linux Container Network Monitoring and ...

Packet modifications

Local process

Socket lookup

Traffic control, ingress

packet

Protocol layer

Network layer

Link layer

Local process

NAT

Traffic control, egress

Kubernetes node 1 Kubernetes node 2

Page 22: An Exploration of Linux Container Network Monitoring and ...

Netlink

Page 23: An Exploration of Linux Container Network Monitoring and ...

Netlink socketssocket(AF_NETLINK, SOCK_RAW, NETLINK_...);

Several Netlink sockets:

- NETLINK_ROUTE- NETLINK_INET_DIAG- NETLINK_SELINUX- NETLINK_CONNECTOR- NETLINK_NETFILTER- ...

Page 24: An Exploration of Linux Container Network Monitoring and ...

conntrack

Page 25: An Exploration of Linux Container Network Monitoring and ...

conntrack -E- Use NETLINK_NETFILTER sockets to subscribe to Conntrack events

from the kernel - Is aware of NAT rewritings

Page 26: An Exploration of Linux Container Network Monitoring and ...

conntrack limitations- Conntrack events don’t include:

- Process ID- Network namespace ID

- Conntrack zones included but not necessary used by container run-times

- So harvesting procfs regularly still necessary

Page 27: An Exploration of Linux Container Network Monitoring and ...

Other kind of Netlink sockets?

Page 28: An Exploration of Linux Container Network Monitoring and ...

NETLINK_INET_DIAGsocket(AF_NETLINK, SOCK_RAW, NETLINK_INET_DIAG);

- Fetch information about sockets- Used by ss (“another utility to investigate sockets”)- Basic bytecode to filter the sockets (e.g. “INET_DIAG_BC_JMP”)

- But no notification mechanism- Patch “sock_diag: notify packet socket creation/deletion” (2013)

rejected

Page 29: An Exploration of Linux Container Network Monitoring and ...

Kernel Connectorsocket(AF_NETLINK, SOCK_RAW, NETLINK_CONNECTOR);

Several Kernel Connector agents:

- Device mapper- HyperV- Proc connector

Page 30: An Exploration of Linux Container Network Monitoring and ...

Proc connectorbind(sockfd, ...CN_IDX_PROC...);

sendmsg(sockfd, ...PROC_CN_MCAST_LISTEN...)

- Since Linux v2.6.15 (January 2006)

Notifications for:

- fork- exec- exit

Page 31: An Exploration of Linux Container Network Monitoring and ...

Proc connectorMissing:

- network namespace- RFC patch “proc connector: add namespace events” last month

https://lkml.org/lkml/2016/9/8/588- Sockets

So harvesting procfs regularly still necessary

Page 32: An Exploration of Linux Container Network Monitoring and ...

Proc connector

demo

Page 33: An Exploration of Linux Container Network Monitoring and ...

BPF

Page 34: An Exploration of Linux Container Network Monitoring and ...

Classic BPF (cBPF)

socket

kernel

userspace

BPF_JMP...BPF_LD...BPF_RET...

setsockopt(sockfd,SOL_SOCKET,SO_ATTACH_FILTER,&bpf, sizeof(bpf));recvfrom()

Page 35: An Exploration of Linux Container Network Monitoring and ...

Extended BPF (or eBPF)Program type:

- BPF_PROG_TYPE_SOCKET_FILTER- BPF_PROG_TYPE_KPROBE- BPF_PROG_TYPE_SCHED_CLS- BPF_PROG_TYPE_SCHED_ACT- BPF_PROG_TYPE_TRACEPOINT (Linux >= 4.7)- BPF_PROG_TYPE_XDP

Page 36: An Exploration of Linux Container Network Monitoring and ...

eBPF classifier for qdiscs

eth0

classifier

kernel

userspace

BPF_JMP...BPF_LD...BPF_RET...

if (skb->protocol…) return TC_H_MAKE(TC_H_ROOT, mark); compilation

clang... -march=bpf

uploadin the kernel:

- bpf()- Netlink

x86_64 codeJIT compilation

Page 37: An Exploration of Linux Container Network Monitoring and ...

eBPF maps

kernel

userspace

x86_64 code

eBPF maps

Userspace program

∘ Keep context between calls∘ Report statistics to userspace

Page 38: An Exploration of Linux Container Network Monitoring and ...

Tracepoints with eBPF- BPF_PROG_TYPE_TRACEPOINT since Linux 4.7- Find the list of tracepoints in /sys/kernel/debug/tracing/events- Stable API- But limited tracepoints

Page 39: An Exploration of Linux Container Network Monitoring and ...

kprobes with eBPF- BPF_PROG_TYPE_KPROBE since Linux 4.1- No ABI guarantees- Probe any kernel function

Page 40: An Exploration of Linux Container Network Monitoring and ...

Socket events with kprobe / eBPF- BPF Compiler Collection (BCC)

- bcc/examples/tracing/tcpv4connect.py- Iago’s tcp4tracer (WIP)

- Get connection tuple, pid, netns- tcp_v4_connect- tcp_close- inet_csk_accept

Page 41: An Exploration of Linux Container Network Monitoring and ...

Packet modifications

Local process

Socket lookup

Traffic control, ingress

packet

Protocol layer

Network layer

Link layer

Local process

NAT

Traffic control, egress

Kubernetes node 1 Kubernetes node 2

Page 42: An Exploration of Linux Container Network Monitoring and ...

tcp4tracer & NAT- The connection tuple from the process’ point of view is not enough

- NAT- Kubernetes Services

- Iago’s tcp4tracer (WIP)- nf_nat_ipv4_manip_pkt- nf_nat_tcp_manip_pkt

Page 43: An Exploration of Linux Container Network Monitoring and ...

More metrics

Page 44: An Exploration of Linux Container Network Monitoring and ...

Weave Scope architecture

Kubernetesnode 1

Kubernetesnode 2

Scope App

Scope Probe

Firefox

Scope Probe

Page 45: An Exploration of Linux Container Network Monitoring and ...

Weave Scope plugins

Kubernetesnode 1

Kubernetesnode 2

Scope App

Scope Probe

Firefox

Scope Probe

plugin plugin plugin plugin

Page 46: An Exploration of Linux Container Network Monitoring and ...
Page 47: An Exploration of Linux Container Network Monitoring and ...

HTTP requests plugin- Number of HTTP requests per second- Without instrumenting the application- eBPF kprobe on skb_copy_datagram_iter

kernel

userspace

HTTP serverHTTP client

recvfrom()sendmsg()

GET / HTTP/1.1 skb_copy_datagram_iter()copies the skb into the iovec

Page 48: An Exploration of Linux Container Network Monitoring and ...

HTTP responses plugin- Number of HTTP responses by category (404, etc.)- Without instrumenting the application- eBPF kprobe on skb_copy_datagram_from_iter- Using an eBPF map to track the context between kprobe & kretprobe

kernel

userspace

HTTP serverHTTP client

sendmsg()recvfrom()

HTTP/1.0 200 OK skb_copy_datagram_from_iter()copies the iovec into the skb

Page 49: An Exploration of Linux Container Network Monitoring and ...

Testing degraded networks

Page 50: An Exploration of Linux Container Network Monitoring and ...

Traffic control, why?

web server client

client

client

THEINTERNET

∘ fair distribution of bandwidth

∘ reserve bandwidth to specific applications

∘ avoid bufferbloat

Page 51: An Exploration of Linux Container Network Monitoring and ...

∘ Network scheduling algorithm∘ which packet to emit next?∘ when?

∘ Configurable at run-time:∘ /sbin/tc∘ Netlink

∘ Default on new network interfaces: sysctl net.core.default_qdisc

Queuing disciplines(qdisc)

eth0 THE INTERNETqdisc

Page 52: An Exploration of Linux Container Network Monitoring and ...

Stochastic FairnessQueueing (sfq)

eth0

THE INTERNET

FIFO n

FIFO 1

FIFO 0

...

round robin

Page 53: An Exploration of Linux Container Network Monitoring and ...

Demo

Reproduce this demo yourself: https://github.com/kinvolk/demo

Page 54: An Exploration of Linux Container Network Monitoring and ...

Network emulator(netem)

eth0 THE INTERNETnetem

bandwidth

latency packet loss

corrupt...

Page 55: An Exploration of Linux Container Network Monitoring and ...

Testing with containers

container 1 container 2

eth0eth0

Testing framework

configure “netem” qdiscs:bandwidth, latency, packet drop...

Page 56: An Exploration of Linux Container Network Monitoring and ...

Add latency on a specific connection

front-end Firefox

catalogue

ordersorders-db

payment

latency=100ms

Page 57: An Exploration of Linux Container Network Monitoring and ...

How to define classes of traffic

eth0

netem

interface

latency=100ms

dest_ip=10.0.4.* dest_ip=10.0.5.* other

Page 58: An Exploration of Linux Container Network Monitoring and ...

u32: filter on contenteth0

HTB

HTB

HTBHTB HTB

netemnetem netem

interface

root qdisc (type = HTB)

root class (type = HTB)

leaf qdiscs (type = netem)

leaf classes (type = HTB)

filters (type=u32)

otherip=10.0.5.*ip=10.0.4.*

latency=10ms

Page 59: An Exploration of Linux Container Network Monitoring and ...

Filtering with cBPF/eBPF

eth0

BPF

netemnetem

kernel

userspace

BPF_JMP...BPF_LD...BPF_RET...

if (skb->protocol…) return TC_H_MAKE(TC_H_ROOT, mark); compilation

clang... -march=bpf

uploadin the kernel:

- bpf()- Netlink

x86_64 codeJIT compilation

Page 60: An Exploration of Linux Container Network Monitoring and ...

eBPF maps

eth0

BPF

netemnetem

kernel

userspace

x86_64 code

eBPF map

tc

Page 61: An Exploration of Linux Container Network Monitoring and ...

Questions?The slides: https://goo.gl/iDL8te