Windows Azure VMs VMs and Cloud Services VM Availability Images and Disks Managing VMs Agenda.
Device Assignment for VMs in Kubernetes - FOSDEM · Device Assignment for VMs in Kubernetes Martin...
Transcript of Device Assignment for VMs in Kubernetes - FOSDEM · Device Assignment for VMs in Kubernetes Martin...
![Page 1: Device Assignment for VMs in Kubernetes - FOSDEM · Device Assignment for VMs in Kubernetes Martin Polednik (@mpolednik) Software Engineer @ Red Hat $ whoami ... Overview • no special](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec98c67677e3c7a135931e8/html5/thumbnails/1.jpg)
Device Assignment for VMs in Kubernetes
Martin Polednik (@mpolednik) Software Engineer @ Red Hat
![Page 2: Device Assignment for VMs in Kubernetes - FOSDEM · Device Assignment for VMs in Kubernetes Martin Polednik (@mpolednik) Software Engineer @ Red Hat $ whoami ... Overview • no special](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec98c67677e3c7a135931e8/html5/thumbnails/2.jpg)
$ whoami• Golang, Python engineer
• working on oVirt and KubeVirt
• node/host management level virtualization tech
• device assignment w/ VFIO, (v)GPU, SR-IOV
• NUMA, hugepages, CPU architectures
• https://mpolednik.github.io/
![Page 3: Device Assignment for VMs in Kubernetes - FOSDEM · Device Assignment for VMs in Kubernetes Martin Polednik (@mpolednik) Software Engineer @ Red Hat $ whoami ... Overview • no special](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec98c67677e3c7a135931e8/html5/thumbnails/3.jpg)
The Stack• VM device assignment (VFIO)
• libvirt
• Docker
• Kubernetes
• KubeVirt
![Page 4: Device Assignment for VMs in Kubernetes - FOSDEM · Device Assignment for VMs in Kubernetes Martin Polednik (@mpolednik) Software Engineer @ Red Hat $ whoami ... Overview • no special](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec98c67677e3c7a135931e8/html5/thumbnails/4.jpg)
Devices & Virtualization
![Page 5: Device Assignment for VMs in Kubernetes - FOSDEM · Device Assignment for VMs in Kubernetes Martin Polednik (@mpolednik) Software Engineer @ Red Hat $ whoami ... Overview • no special](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec98c67677e3c7a135931e8/html5/thumbnails/5.jpg)
What even is a device?
• many memory regions!
• /sys/bus/pci/${device_address}/...
• /dev/...
![Page 6: Device Assignment for VMs in Kubernetes - FOSDEM · Device Assignment for VMs in Kubernetes Martin Polednik (@mpolednik) Software Engineer @ Red Hat $ whoami ... Overview • no special](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec98c67677e3c7a135931e8/html5/thumbnails/6.jpg)
VFIO 101• PCI driver
• devices bound to it can be used in VMs
• IOMMU groups based on DMA isolation
• explained in Slicing a (v)GPU talk at DevConf.cz
• https://www.youtube.com/watch?v=G8b9jlFN-nk
![Page 7: Device Assignment for VMs in Kubernetes - FOSDEM · Device Assignment for VMs in Kubernetes Martin Polednik (@mpolednik) Software Engineer @ Red Hat $ whoami ... Overview • no special](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec98c67677e3c7a135931e8/html5/thumbnails/7.jpg)
IOMMU Groups
• group contains 1-N devices
• assignment granularity at group level
• e.g. GPU + HDMI sound card
• accessed at /dev/vfio/${N}
![Page 8: Device Assignment for VMs in Kubernetes - FOSDEM · Device Assignment for VMs in Kubernetes Martin Polednik (@mpolednik) Software Engineer @ Red Hat $ whoami ... Overview • no special](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec98c67677e3c7a135931e8/html5/thumbnails/8.jpg)
![Page 9: Device Assignment for VMs in Kubernetes - FOSDEM · Device Assignment for VMs in Kubernetes Martin Polednik (@mpolednik) Software Engineer @ Red Hat $ whoami ... Overview • no special](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec98c67677e3c7a135931e8/html5/thumbnails/9.jpg)
libvirt
• daemon & library for single-node VM management
• abstracts QEMU cmdline interface by XML
• refers to devices by their PCI address
![Page 10: Device Assignment for VMs in Kubernetes - FOSDEM · Device Assignment for VMs in Kubernetes Martin Polednik (@mpolednik) Software Engineer @ Red Hat $ whoami ... Overview • no special](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec98c67677e3c7a135931e8/html5/thumbnails/10.jpg)
libvirt
... <devices> ... <hostdev managed="no" mode="subsystem" type="pci"> <source> <address bus="7" domain="0" function="0" slot="0" /> </source> </hostdev> ... </devices> ...
![Page 11: Device Assignment for VMs in Kubernetes - FOSDEM · Device Assignment for VMs in Kubernetes Martin Polednik (@mpolednik) Software Engineer @ Red Hat $ whoami ... Overview • no special](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec98c67677e3c7a135931e8/html5/thumbnails/11.jpg)
Devices in Containers
![Page 12: Device Assignment for VMs in Kubernetes - FOSDEM · Device Assignment for VMs in Kubernetes Martin Polednik (@mpolednik) Software Engineer @ Red Hat $ whoami ... Overview • no special](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec98c67677e3c7a135931e8/html5/thumbnails/12.jpg)
Overview• no special driver needed
• device path exposed to container
• --device, --volume (?), --privileged (?!)
• DRI, toolkits, any required endpoints
• also sets up cgroups
![Page 13: Device Assignment for VMs in Kubernetes - FOSDEM · Device Assignment for VMs in Kubernetes Martin Polednik (@mpolednik) Software Engineer @ Red Hat $ whoami ... Overview • no special](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec98c67677e3c7a135931e8/html5/thumbnails/13.jpg)
Overview
• sufficient unless orchestration is needed
• ... in that case, building block for Kubernetes device assignment
![Page 14: Device Assignment for VMs in Kubernetes - FOSDEM · Device Assignment for VMs in Kubernetes Martin Polednik (@mpolednik) Software Engineer @ Red Hat $ whoami ... Overview • no special](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec98c67677e3c7a135931e8/html5/thumbnails/14.jpg)
Devices in Kubernetes
![Page 15: Device Assignment for VMs in Kubernetes - FOSDEM · Device Assignment for VMs in Kubernetes Martin Polednik (@mpolednik) Software Engineer @ Red Hat $ whoami ... Overview • no special](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec98c67677e3c7a135931e8/html5/thumbnails/15.jpg)
Kubernetes 101
• orchestrate containers (in declarative way)
• pod = several containers
• pod, container, node etc. are just resources
• the talk will show resources in YAMLs
![Page 16: Device Assignment for VMs in Kubernetes - FOSDEM · Device Assignment for VMs in Kubernetes Martin Polednik (@mpolednik) Software Engineer @ Red Hat $ whoami ... Overview • no special](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec98c67677e3c7a135931e8/html5/thumbnails/16.jpg)
NVIDIA GPUs
• vendor-specific feature since 1.3
• `accelerators` FeatureGate
• request N GPUs
![Page 17: Device Assignment for VMs in Kubernetes - FOSDEM · Device Assignment for VMs in Kubernetes Martin Polednik (@mpolednik) Software Engineer @ Red Hat $ whoami ... Overview • no special](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec98c67677e3c7a135931e8/html5/thumbnails/17.jpg)
NVIDIA GPUs
spec: containers: - name: demo ... resources: requests: alpha.kubernetes.io/nvidia-gpu: 2
![Page 18: Device Assignment for VMs in Kubernetes - FOSDEM · Device Assignment for VMs in Kubernetes Martin Polednik (@mpolednik) Software Engineer @ Red Hat $ whoami ... Overview • no special](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec98c67677e3c7a135931e8/html5/thumbnails/18.jpg)
NVIDIA GPUs
• deprecated by device plugins
![Page 19: Device Assignment for VMs in Kubernetes - FOSDEM · Device Assignment for VMs in Kubernetes Martin Polednik (@mpolednik) Software Engineer @ Red Hat $ whoami ... Overview • no special](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec98c67677e3c7a135931e8/html5/thumbnails/19.jpg)
Device Plugins• since Kubernetes 1.8
• shortened to DPI(s)
• gated behind `DevicePlugins` FeatureGate
• gRPC server(s) that exposes available resources
• Register, Allocate, ListAndWatch
![Page 20: Device Assignment for VMs in Kubernetes - FOSDEM · Device Assignment for VMs in Kubernetes Martin Polednik (@mpolednik) Software Engineer @ Red Hat $ whoami ... Overview • no special](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec98c67677e3c7a135931e8/html5/thumbnails/20.jpg)
Device Plugins
• one gRPC server per tracked resource
![Page 21: Device Assignment for VMs in Kubernetes - FOSDEM · Device Assignment for VMs in Kubernetes Martin Polednik (@mpolednik) Software Engineer @ Red Hat $ whoami ... Overview • no special](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec98c67677e3c7a135931e8/html5/thumbnails/21.jpg)
![Page 22: Device Assignment for VMs in Kubernetes - FOSDEM · Device Assignment for VMs in Kubernetes Martin Polednik (@mpolednik) Software Engineer @ Red Hat $ whoami ... Overview • no special](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec98c67677e3c7a135931e8/html5/thumbnails/22.jpg)
fancy starting 50+ gRPC servers?
![Page 23: Device Assignment for VMs in Kubernetes - FOSDEM · Device Assignment for VMs in Kubernetes Martin Polednik (@mpolednik) Software Engineer @ Red Hat $ whoami ... Overview • no special](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec98c67677e3c7a135931e8/html5/thumbnails/23.jpg)
$ sh kubectl.sh get nodes --show-all -o json | grep -A 10 alloca "allocatable": { "cpu": "4", "hugepages-1Gi": "0", "hugepages-2Mi": "0", "memory": "12181600Ki", "mpolednik.github.io/102b_0522": "1", "mpolednik.github.io/111d_8018": "3", "mpolednik.github.io/8086_10c9": "2", "mpolednik.github.io/8086_10e8": "4", "mpolednik.github.io/8086_244e": "1", "mpolednik.github.io/8086_2c70": "1", ...
![Page 24: Device Assignment for VMs in Kubernetes - FOSDEM · Device Assignment for VMs in Kubernetes Martin Polednik (@mpolednik) Software Engineer @ Red Hat $ whoami ... Overview • no special](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec98c67677e3c7a135931e8/html5/thumbnails/24.jpg)
apiVersion: v1 kind: Pod metadata: name: nginx-apparmor spec: containers: - name: nginx image: nginx resources: requests: mpolednik.github.io/8086_10e8: 1 limits: mpolednik.github.io/8086_10e8: 1
![Page 25: Device Assignment for VMs in Kubernetes - FOSDEM · Device Assignment for VMs in Kubernetes Martin Polednik (@mpolednik) Software Engineer @ Red Hat $ whoami ... Overview • no special](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec98c67677e3c7a135931e8/html5/thumbnails/25.jpg)
Device Plugins• flexible
• allows the node to advertise any resource
• /dev/kvm is a device too!
• and mount it into a container (not pod!)
• still in development
• Deallocate gRPC endpoint?
![Page 26: Device Assignment for VMs in Kubernetes - FOSDEM · Device Assignment for VMs in Kubernetes Martin Polednik (@mpolednik) Software Engineer @ Red Hat $ whoami ... Overview • no special](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec98c67677e3c7a135931e8/html5/thumbnails/26.jpg)
KubeVirt
![Page 27: Device Assignment for VMs in Kubernetes - FOSDEM · Device Assignment for VMs in Kubernetes Martin Polednik (@mpolednik) Software Engineer @ Red Hat $ whoami ... Overview • no special](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec98c67677e3c7a135931e8/html5/thumbnails/27.jpg)
KubeVirt
• (not only) pet VMs in Kubernetes
• uses CRD (custom resource definition)
• and several custom services
• based on libvirt
![Page 28: Device Assignment for VMs in Kubernetes - FOSDEM · Device Assignment for VMs in Kubernetes Martin Polednik (@mpolednik) Software Engineer @ Red Hat $ whoami ... Overview • no special](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec98c67677e3c7a135931e8/html5/thumbnails/28.jpg)
![Page 29: Device Assignment for VMs in Kubernetes - FOSDEM · Device Assignment for VMs in Kubernetes Martin Polednik (@mpolednik) Software Engineer @ Red Hat $ whoami ... Overview • no special](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec98c67677e3c7a135931e8/html5/thumbnails/29.jpg)
![Page 30: Device Assignment for VMs in Kubernetes - FOSDEM · Device Assignment for VMs in Kubernetes Martin Polednik (@mpolednik) Software Engineer @ Red Hat $ whoami ... Overview • no special](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec98c67677e3c7a135931e8/html5/thumbnails/30.jpg)
Devices in KubeVirt• mix of both worlds
• Kubernetes assignment for devices
• VFIO within the (docker) container
• requires custom DPI
• + VM spec to POD spec translation
![Page 31: Device Assignment for VMs in Kubernetes - FOSDEM · Device Assignment for VMs in Kubernetes Martin Polednik (@mpolednik) Software Engineer @ Red Hat $ whoami ... Overview • no special](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec98c67677e3c7a135931e8/html5/thumbnails/31.jpg)
VFIO DPIhttps://github.com/kubevirt/kubernetes-device-plugins (WIP)
![Page 32: Device Assignment for VMs in Kubernetes - FOSDEM · Device Assignment for VMs in Kubernetes Martin Polednik (@mpolednik) Software Engineer @ Red Hat $ whoami ... Overview • no special](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec98c67677e3c7a135931e8/html5/thumbnails/32.jpg)
VFIO DPI• ensure vfio-pci is loaded
• enumerates /sys/bus/pci/devices
• for each device found
• get vendor ID, device ID, IOMMU group
• report it back to Kubelet (via gRPC API)
![Page 33: Device Assignment for VMs in Kubernetes - FOSDEM · Device Assignment for VMs in Kubernetes Martin Polednik (@mpolednik) Software Engineer @ Red Hat $ whoami ... Overview • no special](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec98c67677e3c7a135931e8/html5/thumbnails/33.jpg)
VFIO DPI• the missing parts:
• IOMMU group awareness (report conflicting groups as unhealthy? + DPI topology)
• device deallocation (inotify VFIO endpoint?)
• edge case handling (Kubelet dies, device plugin dies)
![Page 34: Device Assignment for VMs in Kubernetes - FOSDEM · Device Assignment for VMs in Kubernetes Martin Polednik (@mpolednik) Software Engineer @ Red Hat $ whoami ... Overview • no special](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec98c67677e3c7a135931e8/html5/thumbnails/34.jpg)
Bridging VMs and pods
![Page 35: Device Assignment for VMs in Kubernetes - FOSDEM · Device Assignment for VMs in Kubernetes Martin Polednik (@mpolednik) Software Engineer @ Red Hat $ whoami ... Overview • no special](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec98c67677e3c7a135931e8/html5/thumbnails/35.jpg)
What We Have (idea)spec: domain: devices: ... passthrough: - type: pci vendor: 1000 device: 1000 ... memory:
![Page 36: Device Assignment for VMs in Kubernetes - FOSDEM · Device Assignment for VMs in Kubernetes Martin Polednik (@mpolednik) Software Engineer @ Red Hat $ whoami ... Overview • no special](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec98c67677e3c7a135931e8/html5/thumbnails/36.jpg)
What We Need (reality)spec: containers: - name: demo ... resources: requests: mpolednik.github.io/1000_1000: 1 limits: mpolednik.github.io/1000_1000: 1
![Page 37: Device Assignment for VMs in Kubernetes - FOSDEM · Device Assignment for VMs in Kubernetes Martin Polednik (@mpolednik) Software Engineer @ Red Hat $ whoami ... Overview • no special](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec98c67677e3c7a135931e8/html5/thumbnails/37.jpg)
VFIO Initializer
• https://github.com/mpolednik/k8s-vfio-initializer-plugin (WIP)
• transform VM requirements to pod
• in Kubernetes-native way
• probably not needed after all
![Page 38: Device Assignment for VMs in Kubernetes - FOSDEM · Device Assignment for VMs in Kubernetes Martin Polednik (@mpolednik) Software Engineer @ Red Hat $ whoami ... Overview • no special](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec98c67677e3c7a135931e8/html5/thumbnails/38.jpg)
That's it!** almost
![Page 39: Device Assignment for VMs in Kubernetes - FOSDEM · Device Assignment for VMs in Kubernetes Martin Polednik (@mpolednik) Software Engineer @ Red Hat $ whoami ... Overview • no special](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec98c67677e3c7a135931e8/html5/thumbnails/39.jpg)
Is that really all?• which devices inside pod belong to the VM?
• remember libvirt addressing?
• mount
• /sys
• /sys/bus/pci/devices/${device_address}
• something else?
![Page 40: Device Assignment for VMs in Kubernetes - FOSDEM · Device Assignment for VMs in Kubernetes Martin Polednik (@mpolednik) Software Engineer @ Red Hat $ whoami ... Overview • no special](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec98c67677e3c7a135931e8/html5/thumbnails/40.jpg)
Devices in KubeVirt
• proposal @ https://github.com/kubevirt/kubevirt/pull/593
• DPI @ https://github.com/kubevirt/kubernetes-device-plugins
• Initializer @ https://github.com/mpolednik/k8s-vfio-initializer-plugin
• comments & suggestions welcome!
![Page 41: Device Assignment for VMs in Kubernetes - FOSDEM · Device Assignment for VMs in Kubernetes Martin Polednik (@mpolednik) Software Engineer @ Red Hat $ whoami ... Overview • no special](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec98c67677e3c7a135931e8/html5/thumbnails/41.jpg)
Summary
• VMs in Kubernetes are real!
• and so is device assignment
![Page 42: Device Assignment for VMs in Kubernetes - FOSDEM · Device Assignment for VMs in Kubernetes Martin Polednik (@mpolednik) Software Engineer @ Red Hat $ whoami ... Overview • no special](https://reader034.fdocuments.us/reader034/viewer/2022042223/5ec98c67677e3c7a135931e8/html5/thumbnails/42.jpg)
Questions?Thank you!
Slides & Blog @ https://mpolednik.github.io/