Multi-process QEMU - KVM · • Both KVM and Xen use QEMU emulation (IDE, e1000) • None use the...
Transcript of Multi-process QEMU - KVM · • Both KVM and Xen use QEMU emulation (IDE, e1000) • None use the...
![Page 1: Multi-process QEMU - KVM · • Both KVM and Xen use QEMU emulation (IDE, e1000) • None use the binary translation in QEMU. – Xen and KVM in the hypervisor code base deal with](https://reader035.fdocuments.us/reader035/viewer/2022062311/5e16ac99afd14279c7127e2e/html5/thumbnails/1.jpg)
Multi-process QEMU
Marc-Andre LureauSenior Software Engineer, Red Hat, Inc.
Konrad Rzeszutek WilkSoftware Director, Oracle
October 27 2017
![Page 2: Multi-process QEMU - KVM · • Both KVM and Xen use QEMU emulation (IDE, e1000) • None use the binary translation in QEMU. – Xen and KVM in the hypervisor code base deal with](https://reader035.fdocuments.us/reader035/viewer/2022062311/5e16ac99afd14279c7127e2e/html5/thumbnails/2.jpg)
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Slim down QEMU
Konrad Rzeszutek Wilk
Software Director, Oracle, Inc.
Marc-Andre Lureau
Senior Software Engineer, Red Hat, Inc.
Presented with
2
![Page 3: Multi-process QEMU - KVM · • Both KVM and Xen use QEMU emulation (IDE, e1000) • None use the binary translation in QEMU. – Xen and KVM in the hypervisor code base deal with](https://reader035.fdocuments.us/reader035/viewer/2022062311/5e16ac99afd14279c7127e2e/html5/thumbnails/3.jpg)
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Program Agenda
QEMU in virtualization
Xen usage of QEMU
KVM usage of QEMU
Security!
1
2
3
4
3
![Page 4: Multi-process QEMU - KVM · • Both KVM and Xen use QEMU emulation (IDE, e1000) • None use the binary translation in QEMU. – Xen and KVM in the hypervisor code base deal with](https://reader035.fdocuments.us/reader035/viewer/2022062311/5e16ac99afd14279c7127e2e/html5/thumbnails/4.jpg)
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
QEMU usage• Both KVM and Xen use QEMU emulation (IDE, e1000)
• None use the binary translation in QEMU.– Xen and KVM in the hypervisor code base deal with opcodes:
– movdqa m128,xmm– (traped on MMIO access)
• KVM uses QEMU as control stack (launch/destroy guest) as in privileged operations (access to /dev/kvm).
• Xen uses only QEMU emulation (which is why you can’t launch guests with QEMU parameters and need to use libvirt or xl).
4
![Page 5: Multi-process QEMU - KVM · • Both KVM and Xen use QEMU emulation (IDE, e1000) • None use the binary translation in QEMU. – Xen and KVM in the hypervisor code base deal with](https://reader035.fdocuments.us/reader035/viewer/2022062311/5e16ac99afd14279c7127e2e/html5/thumbnails/5.jpg)
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Evil guest attack vectors
• Cloud provides have to deal with risk of customers becoming evil.
• The “customers” have usually four primary attack vectors:– Emulation (VENOM – CVE-2015-3456) of floppy driver, VGA, NICs, etc in QEMU.
– MSRs (x2APIC range gap – CVE-2014-7188) of x2APIC emulation in hypervisor.
– VMCALL (hypercalls to hypervisor – CVE-2012-3497).
– Opcode emulations (INVEPT instructions – CVE-2015-0418).
• This talk is about the first: QEMU and ways to lessen the impact if it is exploited, or alternatively erect more “jails” around QEMU.
5
![Page 6: Multi-process QEMU - KVM · • Both KVM and Xen use QEMU emulation (IDE, e1000) • None use the binary translation in QEMU. – Xen and KVM in the hypervisor code base deal with](https://reader035.fdocuments.us/reader035/viewer/2022062311/5e16ac99afd14279c7127e2e/html5/thumbnails/6.jpg)
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Xen and KVM architecture (usual)6
![Page 7: Multi-process QEMU - KVM · • Both KVM and Xen use QEMU emulation (IDE, e1000) • None use the binary translation in QEMU. – Xen and KVM in the hypervisor code base deal with](https://reader035.fdocuments.us/reader035/viewer/2022062311/5e16ac99afd14279c7127e2e/html5/thumbnails/7.jpg)
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Xen disaggregated architecture
● Move QEMU to be a standalone guest running in ring0 (32MB guest).
● Each stubdomain serves one guest.
● Evil guest has to subvert stub domain emulation first, then from there jump to control domain.
![Page 8: Multi-process QEMU - KVM · • Both KVM and Xen use QEMU emulation (IDE, e1000) • None use the binary translation in QEMU. – Xen and KVM in the hypervisor code base deal with](https://reader035.fdocuments.us/reader035/viewer/2022062311/5e16ac99afd14279c7127e2e/html5/thumbnails/8.jpg)
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Xen disaggregated architecture (network)
● Evil guest uses e1000 for attack.
● QEMU uses PV frontend driver to send packets to real backend
● If evil guest subverts stub domain the next attack is the PV protocol
● CVE-2015-8550: double fetch:
“Specifically the shared memory between the frontend and backend can be fetched twice (during which time the frontend can alter the contents) possibly leading to arbitrary code execution in backend.
● But protocol MUCH simpler than emulated devices.
![Page 9: Multi-process QEMU - KVM · • Both KVM and Xen use QEMU emulation (IDE, e1000) • None use the binary translation in QEMU. – Xen and KVM in the hypervisor code base deal with](https://reader035.fdocuments.us/reader035/viewer/2022062311/5e16ac99afd14279c7127e2e/html5/thumbnails/9.jpg)
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Xen disaggregated architecture (serial)
● Privilege opcodes (out/in) always end up in hypervisor.
● A ring between hypervisor and QEMU for device model to process.
● QEMU and xenstored have a PV ring to copy data back/forth.
![Page 10: Multi-process QEMU - KVM · • Both KVM and Xen use QEMU emulation (IDE, e1000) • None use the binary translation in QEMU. – Xen and KVM in the hypervisor code base deal with](https://reader035.fdocuments.us/reader035/viewer/2022062311/5e16ac99afd14279c7127e2e/html5/thumbnails/10.jpg)
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Xen disaggregated architecture: jail around QEMU
● In effect the barrier between QEMU and control stack is via the PV ring.
● If evil guest exploits stub domain they are the same place as before.
● Attacks left then are via:– MSRs
– Hypervisor hypercalls
– Opcode emulation
– (But this presentation is not about those attacks).
![Page 11: Multi-process QEMU - KVM · • Both KVM and Xen use QEMU emulation (IDE, e1000) • None use the binary translation in QEMU. – Xen and KVM in the hypervisor code base deal with](https://reader035.fdocuments.us/reader035/viewer/2022062311/5e16ac99afd14279c7127e2e/html5/thumbnails/11.jpg)
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Can we do something similar in KVM?
![Page 12: Multi-process QEMU - KVM · • Both KVM and Xen use QEMU emulation (IDE, e1000) • None use the binary translation in QEMU. – Xen and KVM in the hypervisor code base deal with](https://reader035.fdocuments.us/reader035/viewer/2022062311/5e16ac99afd14279c7127e2e/html5/thumbnails/12.jpg)
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
• QEMU is used for emulation and control stack.– If we disaggregate QEMU we can move each component in its own process.
• We have security measures in place: – secomp & ebpf (filter the ioctls to /dev/kvm)
– Containers (chroot jails)
– Continuing work on improving QEMU security
• Sure, but separating components apart (each running in its own jail) means we can focus security audit on the high-stake parts
• OK, how do we do this?
12
Can we do something similar in KVM? Is it needed?
![Page 13: Multi-process QEMU - KVM · • Both KVM and Xen use QEMU emulation (IDE, e1000) • None use the binary translation in QEMU. – Xen and KVM in the hypervisor code base deal with](https://reader035.fdocuments.us/reader035/viewer/2022062311/5e16ac99afd14279c7127e2e/html5/thumbnails/13.jpg)
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 13
![Page 14: Multi-process QEMU - KVM · • Both KVM and Xen use QEMU emulation (IDE, e1000) • None use the binary translation in QEMU. – Xen and KVM in the hypervisor code base deal with](https://reader035.fdocuments.us/reader035/viewer/2022062311/5e16ac99afd14279c7127e2e/html5/thumbnails/14.jpg)
![Page 15: Multi-process QEMU - KVM · • Both KVM and Xen use QEMU emulation (IDE, e1000) • None use the binary translation in QEMU. – Xen and KVM in the hypervisor code base deal with](https://reader035.fdocuments.us/reader035/viewer/2022062311/5e16ac99afd14279c7127e2e/html5/thumbnails/15.jpg)
Multi-process QEMU
Marc-Andre LureauSenior Software Engineer, Red Hat, Inc.
Konrad Rzeszutek WilkSoftware Director, Oracle
October 27 2017
![Page 16: Multi-process QEMU - KVM · • Both KVM and Xen use QEMU emulation (IDE, e1000) • None use the binary translation in QEMU. – Xen and KVM in the hypervisor code base deal with](https://reader035.fdocuments.us/reader035/viewer/2022062311/5e16ac99afd14279c7127e2e/html5/thumbnails/16.jpg)
![Page 17: Multi-process QEMU - KVM · • Both KVM and Xen use QEMU emulation (IDE, e1000) • None use the binary translation in QEMU. – Xen and KVM in the hypervisor code base deal with](https://reader035.fdocuments.us/reader035/viewer/2022062311/5e16ac99afd14279c7127e2e/html5/thumbnails/17.jpg)
● Motivations for QEMU● Requirements for devices
● KVM features
● Various QEMU solutions● Conclusion & QA
![Page 18: Multi-process QEMU - KVM · • Both KVM and Xen use QEMU emulation (IDE, e1000) • None use the binary translation in QEMU. – Xen and KVM in the hypervisor code base deal with](https://reader035.fdocuments.us/reader035/viewer/2022062311/5e16ac99afd14279c7127e2e/html5/thumbnails/18.jpg)
18
A big binary
elmarco@boraha:~$ ls -lhS /bin/ | head -n20
-rwxr-xr-x. 1 root root 33M Aug 16 16:00 dockerd-current
-rwxr-xr-x. 1 root root 17M Sep 15 00:46 emacs-25.3
-rwxr-xr-x. 1 root root 16M Sep 7 16:32 node
-rwxr-xr-x. 1 root root 15M Jun 26 11:51 ocamlopt.byte
-rwxr-xr-x. 1 root root 15M Jul 4 15:33 doxygen
-rwxr-xr-x. 1 root root 13M Aug 16 16:00 docker-current
-rwxr-xr-x. 1 root root 12M Sep 8 21:59 qemu-system-aarch64
-rwxr-xr-x. 1 root root 12M Sep 8 21:59 qemu-system-arm
-rwxr-xr-x. 1 root root 12M Jun 26 11:51 ocaml
-rwxr-xr-x. 1 root root 11M Sep 8 21:59 qemu-system-x86_64
-rwxr-xr-x. 1 root root 11M Sep 8 21:59 qemu-system-i386
-rwxr-xr-x. 1 root root 11M Jun 26 11:51 ocamlc.byte
-rwxr-xr-x. 1 root root 11M Sep 8 21:59 qemu-system-mips64el
-rwxr-xr-x. 1 root root 11M Sep 8 21:59 qemu-system-mips64
-rwxr-xr-x. 1 root root 11M Sep 8 21:59 qemu-system-mipsel
-rwxr-xr-x. 1 root root 11M Sep 8 21:59 qemu-system-mips
-rwxr-xr-x. 1 root root 7.1M Apr 25 17:44 crash
-rwxr-xr-x. 1 root root 6.9M Jun 26 11:51 ocamldoc.opt
-rwxr-xr-x. 1 root root 6.4M Jun 26 11:51 ocamlopt.opt
![Page 19: Multi-process QEMU - KVM · • Both KVM and Xen use QEMU emulation (IDE, e1000) • None use the binary translation in QEMU. – Xen and KVM in the hypervisor code base deal with](https://reader035.fdocuments.us/reader035/viewer/2022062311/5e16ac99afd14279c7127e2e/html5/thumbnails/19.jpg)
19
A big project
$ cloc qemu-2.10- files: 4 280- comment: 172 425 - code: 1 186 140
$ cloc kvmtool- files: 275- comment: 3 728- code: 27 844$ cloc crosvm- code: 32 159
$ cloc linux- files: 49 744- code: 16 834 046
How much with all dependencies?
![Page 20: Multi-process QEMU - KVM · • Both KVM and Xen use QEMU emulation (IDE, e1000) • None use the binary translation in QEMU. – Xen and KVM in the hypervisor code base deal with](https://reader035.fdocuments.us/reader035/viewer/2022062311/5e16ac99afd14279c7127e2e/html5/thumbnails/20.jpg)
20
Still growing
1.0 1.5 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.100
200000
400000
600000
800000
1000000
1200000
1400000
locMostly in C!
![Page 21: Multi-process QEMU - KVM · • Both KVM and Xen use QEMU emulation (IDE, e1000) • None use the binary translation in QEMU. – Xen and KVM in the hypervisor code base deal with](https://reader035.fdocuments.us/reader035/viewer/2022062311/5e16ac99afd14279c7127e2e/html5/thumbnails/21.jpg)
21
Many dependencies
● Fedora 26: qemu 2.9.0-5.fc26.x86_64
$ readelf -d /usr/bin/qemu-system-x86_64 | grep NEEDED | wc -l60
$ ldd /usr/bin/qemu-system-x86_64 | wc -l158
● Kvmtool (with all optional dependencies, gtk3, SDL, vncserver...)
$ readelf -d lkvm | grep NEEDED | wc -l19
$ ldd lkvm | wc -l83
![Page 22: Multi-process QEMU - KVM · • Both KVM and Xen use QEMU emulation (IDE, e1000) • None use the binary translation in QEMU. – Xen and KVM in the hypervisor code base deal with](https://reader035.fdocuments.us/reader035/viewer/2022062311/5e16ac99afd14279c7127e2e/html5/thumbnails/22.jpg)
22
Too big to fail
![Page 23: Multi-process QEMU - KVM · • Both KVM and Xen use QEMU emulation (IDE, e1000) • None use the binary translation in QEMU. – Xen and KVM in the hypervisor code base deal with](https://reader035.fdocuments.us/reader035/viewer/2022062311/5e16ac99afd14279c7127e2e/html5/thumbnails/23.jpg)
![Page 24: Multi-process QEMU - KVM · • Both KVM and Xen use QEMU emulation (IDE, e1000) • None use the binary translation in QEMU. – Xen and KVM in the hypervisor code base deal with](https://reader035.fdocuments.us/reader035/viewer/2022062311/5e16ac99afd14279c7127e2e/html5/thumbnails/24.jpg)
24
Paolo threads
![Page 25: Multi-process QEMU - KVM · • Both KVM and Xen use QEMU emulation (IDE, e1000) • None use the binary translation in QEMU. – Xen and KVM in the hypervisor code base deal with](https://reader035.fdocuments.us/reader035/viewer/2022062311/5e16ac99afd14279c7127e2e/html5/thumbnails/25.jpg)
VCPU0
VCPU3
VCPU1
VCPU2
UI
Dev1
Dev2
Dev3
libvirtd
Worker ...
Ideal architecture ?
qemu
![Page 26: Multi-process QEMU - KVM · • Both KVM and Xen use QEMU emulation (IDE, e1000) • None use the binary translation in QEMU. – Xen and KVM in the hypervisor code base deal with](https://reader035.fdocuments.us/reader035/viewer/2022062311/5e16ac99afd14279c7127e2e/html5/thumbnails/26.jpg)
Why not?
● The monolithic vs microkernel/services debate● Difficult to manage● Difficult to debug● Difficult to test (test matrix)● Performance?
![Page 27: Multi-process QEMU - KVM · • Both KVM and Xen use QEMU emulation (IDE, e1000) • None use the binary translation in QEMU. – Xen and KVM in the hypervisor code base deal with](https://reader035.fdocuments.us/reader035/viewer/2022062311/5e16ac99afd14279c7127e2e/html5/thumbnails/27.jpg)
Why seperate processes?
● Modularity● clear interface separation = less conflicts/bql concerns● smaller qemu, less dependencies● allowing alternative implementations, “crazy” ideas● separate projects, different release cycles...
● Isolation (+iommu) & crash robustness● Better sandboxing (seccomp/ns)● Easier monitoring/tweaking (memory, cpu etc)
![Page 28: Multi-process QEMU - KVM · • Both KVM and Xen use QEMU emulation (IDE, e1000) • None use the binary translation in QEMU. – Xen and KVM in the hypervisor code base deal with](https://reader035.fdocuments.us/reader035/viewer/2022062311/5e16ac99afd14279c7127e2e/html5/thumbnails/28.jpg)
Sandboxing for dummies
Change user id
Regular DAC/MAC check
Add/drop capabilites(7)
Subset of root privileges (if needed)
Namespaces(7)Own view/access of the system (uid/pid/ns/net/ipc..)
Seccomp()/bpf
Filter syscalls
Libvirt, minijail, systemd, flatpack...
![Page 29: Multi-process QEMU - KVM · • Both KVM and Xen use QEMU emulation (IDE, e1000) • None use the binary translation in QEMU. – Xen and KVM in the hypervisor code base deal with](https://reader035.fdocuments.us/reader035/viewer/2022062311/5e16ac99afd14279c7127e2e/html5/thumbnails/29.jpg)
A word about memory fragmentation
All devices & workloads in a single process can lead to more fragmentation.
Using subprocesses may help to partition the load and more easily reclaim the space.
![Page 30: Multi-process QEMU - KVM · • Both KVM and Xen use QEMU emulation (IDE, e1000) • None use the binary translation in QEMU. – Xen and KVM in the hypervisor code base deal with](https://reader035.fdocuments.us/reader035/viewer/2022062311/5e16ac99afd14279c7127e2e/html5/thumbnails/30.jpg)
30
How? various strategies
● Fork-only strategy (crosvm)● Code in same binary● No version combinations, less modularity● Device setup and teardown can be hardcoded in parent
● Exec a helper or device process● Can allow arbitrary implementations● IPC require greater level of stability● Nicer if IPC allows various kind of devices
![Page 31: Multi-process QEMU - KVM · • Both KVM and Xen use QEMU emulation (IDE, e1000) • None use the binary translation in QEMU. – Xen and KVM in the hypervisor code base deal with](https://reader035.fdocuments.us/reader035/viewer/2022062311/5e16ac99afd14279c7127e2e/html5/thumbnails/31.jpg)
31
Managing the processes
● Qemu● Not a great idea to fork from qemu (VM space, safety)● Slirp & migration can do it...● Could exec() from an helper process instead?
● Outside, libvirt or other:● Not suitable for command line users● Natural fit for libvirt etc
![Page 32: Multi-process QEMU - KVM · • Both KVM and Xen use QEMU emulation (IDE, e1000) • None use the binary translation in QEMU. – Xen and KVM in the hypervisor code base deal with](https://reader035.fdocuments.us/reader035/viewer/2022062311/5e16ac99afd14279c7127e2e/html5/thumbnails/32.jpg)
32
How? various device needs
● HW description & bus registration● Communication mechanism:
● Io / Mmio regions & rw events, Irqs● Memory map (& iommu)● Or at higher level of abstraction (USB etc)
● acpi / device-tree manipulation (& fw_cfg)● Device state & migration● Dirty regions tracking, post-copy...● Object hierarchy / introspection
![Page 33: Multi-process QEMU - KVM · • Both KVM and Xen use QEMU emulation (IDE, e1000) • None use the binary translation in QEMU. – Xen and KVM in the hypervisor code base deal with](https://reader035.fdocuments.us/reader035/viewer/2022062311/5e16ac99afd14279c7127e2e/html5/thumbnails/33.jpg)
33
KVM <-> device emulation
Direct memory access
Or VM exit:run = mmap(cpufd,..)ioctl(cpufd, KVM_RUN)run→exit_reason == KVM_EXIT_IO/MMIOrun→io/mmio_addr mappingBQL!MemoryRegionOps.read/write()
← ioctl(vmfd, KVM_IRQ_LINE, irq_level)
![Page 34: Multi-process QEMU - KVM · • Both KVM and Xen use QEMU emulation (IDE, e1000) • None use the binary translation in QEMU. – Xen and KVM in the hypervisor code base deal with](https://reader035.fdocuments.us/reader035/viewer/2022062311/5e16ac99afd14279c7127e2e/html5/thumbnails/34.jpg)
34
KVM nifty ioctl
KVM_IOEVENTFD
This ioctl attaches or detaches an ioeventfd to a legal pio/mmio address within the guest. A guest write in the registered address will signal the provided event instead of triggering an exit.
KVM_IRQFD
Allows setting an eventfd to directly trigger a guest interrupt.
![Page 35: Multi-process QEMU - KVM · • Both KVM and Xen use QEMU emulation (IDE, e1000) • None use the binary translation in QEMU. – Xen and KVM in the hypervisor code base deal with](https://reader035.fdocuments.us/reader035/viewer/2022062311/5e16ac99afd14279c7127e2e/html5/thumbnails/35.jpg)
35
Ioeventfd vs MemoryRegionOps
struct kvm_ioeventfd {
__u64 datamatch;
__u64 addr; /* legal pio/mmio address */
__u32 len; /* 0, 1, 2, 4, or 8 bytes */
__s32 fd;
__u32 flags;
__u8 pad[36];
};
Write only, coalesced events, not a range API
Extend it to support ranges - IOEVENTFD_FLAG_RANGE?
Then KVM_GET_IOEVENTS (similarity with AIO)
![Page 36: Multi-process QEMU - KVM · • Both KVM and Xen use QEMU emulation (IDE, e1000) • None use the binary translation in QEMU. – Xen and KVM in the hypervisor code base deal with](https://reader035.fdocuments.us/reader035/viewer/2022062311/5e16ac99afd14279c7127e2e/html5/thumbnails/36.jpg)
36
For traditional sync devices
IPC qemu → helper (necessary for TCG)
Introduce a KVM user device?devfd = ioctl(vmfd, KVM_CREATE_DEVICE_USER)reg = { .group = KVM_DEV_USER_GROUP, .attr = KVM_DEV_USER_SET_MEMORY_REGION, .addr = (struct) { .slot = 0, .addr = 0x3f8, .flags = PIO, .eventfd = efd } }ioctl(devfd, KVM_SET_DEVICE_ATTR, ®)poll(efd)ioctl(devfd, KVM_GET_DEVICE_CPU_EXITS, &exits)ioctl(devfd, KVM_SET_DEVICE_CPU_EXITS, &exits)
![Page 37: Multi-process QEMU - KVM · • Both KVM and Xen use QEMU emulation (IDE, e1000) • None use the binary translation in QEMU. – Xen and KVM in the hypervisor code base deal with](https://reader035.fdocuments.us/reader035/viewer/2022062311/5e16ac99afd14279c7127e2e/html5/thumbnails/37.jpg)
37
Migration
In qemu stream vs out of stream
Handled by qemu or not
Security aspect
Share VMState infrastructure with helper?Instead of blobs
Make it a library, IPC hook for saving/loading to/from stream
Unlikely to be accepted as standard in external projects
Mostly non-existent today, with rare exceptions
![Page 38: Multi-process QEMU - KVM · • Both KVM and Xen use QEMU emulation (IDE, e1000) • None use the binary translation in QEMU. – Xen and KVM in the hypervisor code base deal with](https://reader035.fdocuments.us/reader035/viewer/2022062311/5e16ac99afd14279c7127e2e/html5/thumbnails/38.jpg)
38
And today?
✔ VNC / Spice✔ Block devices✔ usbredir / cacard✔ ipmi-bmc-extern✔ TPM emulation✔ ivshmem device✔ vhost, vhost-user ✔ VFIO/mdev
![Page 39: Multi-process QEMU - KVM · • Both KVM and Xen use QEMU emulation (IDE, e1000) • None use the binary translation in QEMU. – Xen and KVM in the hypervisor code base deal with](https://reader035.fdocuments.us/reader035/viewer/2022062311/5e16ac99afd14279c7127e2e/html5/thumbnails/39.jpg)
39
VNC & Spice
UI in remote process
Resume session
Migration
VT & monitor?
![Page 40: Multi-process QEMU - KVM · • Both KVM and Xen use QEMU emulation (IDE, e1000) • None use the binary translation in QEMU. – Xen and KVM in the hypervisor code base deal with](https://reader035.fdocuments.us/reader035/viewer/2022062311/5e16ac99afd14279c7127e2e/html5/thumbnails/40.jpg)
40
What about?
QEMU to start a graphical client instead?
Remove GTK/SDL/VTE/audio code from qemu?
![Page 41: Multi-process QEMU - KVM · • Both KVM and Xen use QEMU emulation (IDE, e1000) • None use the binary translation in QEMU. – Xen and KVM in the hypervisor code base deal with](https://reader035.fdocuments.us/reader035/viewer/2022062311/5e16ac99afd14279c7127e2e/html5/thumbnails/41.jpg)
41
Block devices
$ qemu-nbd -k nbd.sock vm.qcow2
$ qemu -drive driver=nbd, server.path=nbd.sock,server.type=unix
NBD serverprocess
Qemuprocess
(other protocols exist: iSCSI, NBD, SSH, Sheepdog, gluster, http/ftp..)
![Page 42: Multi-process QEMU - KVM · • Both KVM and Xen use QEMU emulation (IDE, e1000) • None use the binary translation in QEMU. – Xen and KVM in the hypervisor code base deal with](https://reader035.fdocuments.us/reader035/viewer/2022062311/5e16ac99afd14279c7127e2e/html5/thumbnails/42.jpg)
42
Block devices
Would performance be good enough for general case?
Could use shared memory, to avoid extra copy, opportunistic polling...
![Page 43: Multi-process QEMU - KVM · • Both KVM and Xen use QEMU emulation (IDE, e1000) • None use the binary translation in QEMU. – Xen and KVM in the hypervisor code base deal with](https://reader035.fdocuments.us/reader035/viewer/2022062311/5e16ac99afd14279c7127e2e/html5/thumbnails/43.jpg)
43
Usbredir
$ usbredirserver -p 2001 <vendorid>:<prodid>
$ qemu <ehci-uhci> …-chardev socket,port=2001,id=chr-device usb-redir,chardev=chr
USB deviceprocess
Qemuprocess
✓migrate ✗migrate
![Page 44: Multi-process QEMU - KVM · • Both KVM and Xen use QEMU emulation (IDE, e1000) • None use the binary translation in QEMU. – Xen and KVM in the hypervisor code base deal with](https://reader035.fdocuments.us/reader035/viewer/2022062311/5e16ac99afd14279c7127e2e/html5/thumbnails/44.jpg)
44
USB Devices
QEMU emulation of USB devices instandalone process using usbredir API?
![Page 45: Multi-process QEMU - KVM · • Both KVM and Xen use QEMU emulation (IDE, e1000) • None use the binary translation in QEMU. – Xen and KVM in the hypervisor code base deal with](https://reader035.fdocuments.us/reader035/viewer/2022062311/5e16ac99afd14279c7127e2e/html5/thumbnails/45.jpg)
45
Cacard
$ qemu … -device usb-ccid-chardev socket,server,port=2001,id=chr-device ccid-card-passthru,chardev=chr
$ vscclient <host> 2001
Smartcardprocess
Qemuprocess
✓migrate ✗migrate
![Page 46: Multi-process QEMU - KVM · • Both KVM and Xen use QEMU emulation (IDE, e1000) • None use the binary translation in QEMU. – Xen and KVM in the hypervisor code base deal with](https://reader035.fdocuments.us/reader035/viewer/2022062311/5e16ac99afd14279c7127e2e/html5/thumbnails/46.jpg)
46
Ipmi-bmc-extern
$ ipmilan -c conf-file -f cmd-file -s statedir
$ qemu … -device ipmi-bmc-extern,chardev=chr-chardev socket,id=chr,host=localhost,port=…-device isa-ipmi-bt,bmc=bmc0,irq=0
Ipmi/BMCprocess
Qemuprocess
![Page 47: Multi-process QEMU - KVM · • Both KVM and Xen use QEMU emulation (IDE, e1000) • None use the binary translation in QEMU. – Xen and KVM in the hypervisor code base deal with](https://reader035.fdocuments.us/reader035/viewer/2022062311/5e16ac99afd14279c7127e2e/html5/thumbnails/47.jpg)
47
TPM emulator
$ swtpm socket --tpmstate dir=/tmp/myvtpm --ctrl type=unixio,path=/tmp/ctrl
$ qemu … -tpmdev emulator,id=tpm0,chardev=chr-chardev socket,id=chr,path=/tmp/ctrl-device tpm-tis,tpmdev=tpm-tpm0,id=tpm0
TPM deviceprocess
Qemuprocess
✓migrate soon
![Page 48: Multi-process QEMU - KVM · • Both KVM and Xen use QEMU emulation (IDE, e1000) • None use the binary translation in QEMU. – Xen and KVM in the hypervisor code base deal with](https://reader035.fdocuments.us/reader035/viewer/2022062311/5e16ac99afd14279c7127e2e/html5/thumbnails/48.jpg)
48
Vhost overview
QEM
U
Guest OS
virtio-netQEM
U
Guest OS
Virtio dev
Host
OS
vhostLinux:- net- vsock- scsi
![Page 49: Multi-process QEMU - KVM · • Both KVM and Xen use QEMU emulation (IDE, e1000) • None use the binary translation in QEMU. – Xen and KVM in the hypervisor code base deal with](https://reader035.fdocuments.us/reader035/viewer/2022062311/5e16ac99afd14279c7127e2e/html5/thumbnails/49.jpg)
49
vhost-userQ
EM
U
Guest OS
virtio-netQEM
U
Guest OS
virtiovhost-user
Virtio dev
vhost-user
events: kick/call
-net, -scsi today!
-blk, -gpu, -input, -crypto coming!
![Page 50: Multi-process QEMU - KVM · • Both KVM and Xen use QEMU emulation (IDE, e1000) • None use the binary translation in QEMU. – Xen and KVM in the hypervisor code base deal with](https://reader035.fdocuments.us/reader035/viewer/2022062311/5e16ac99afd14279c7127e2e/html5/thumbnails/50.jpg)
50
Vhost(-user) in a nutshell
Memory listener to have RAM flat view
SET_MEM_TABLE
SET_VRING_ADDR,SET_VRING_NUM
SET_VRING_KICK(fd), SET_VRING_CALL(fd)
(Fd) GuestAddress
UserAddress
Size
34 0xA000 0xf2bc0000 0x40000000
...
Index DescAddress
UsedAddress
AvailAddress
0 0xf2bc1000
0xf2bc2000
0xf2bc3000
![Page 51: Multi-process QEMU - KVM · • Both KVM and Xen use QEMU emulation (IDE, e1000) • None use the binary translation in QEMU. – Xen and KVM in the hypervisor code base deal with](https://reader035.fdocuments.us/reader035/viewer/2022062311/5e16ac99afd14279c7127e2e/html5/thumbnails/51.jpg)
51
vhost-user-gpu = gpu stack outQ
EM
U
Guest OS
virtio-netQEM
U
Guest OS
virtio-gpuvhost-user-gpu
Virtio dev
vhost-user
GPU UPDATE
-object vhost-user-backend,id=vug,cmd="./vhost-user-gpu"-device virtio-vga,virgl=true,vhost-user=vug
GPU socket commands:- SCANOUT- UPDATE- GL SCANOUT- GL UPDATE (+)- CURSOR UPDATE
Could be handled outside of QEMU (spice or client)
Better perfBetter security
![Page 52: Multi-process QEMU - KVM · • Both KVM and Xen use QEMU emulation (IDE, e1000) • None use the binary translation in QEMU. – Xen and KVM in the hypervisor code base deal with](https://reader035.fdocuments.us/reader035/viewer/2022062311/5e16ac99afd14279c7127e2e/html5/thumbnails/52.jpg)
52
Benefits of virgl out of process?
- avoids blocking qemu main loop
Shaders may take long to compile
- virgl needs to do polling (GL queries & fences)
- virgl crash (various crash/leaks fixed)
- GL isn’t a very safe API (size/buffer mismatch – ARB_robustness is an extension)
![Page 53: Multi-process QEMU - KVM · • Both KVM and Xen use QEMU emulation (IDE, e1000) • None use the binary translation in QEMU. – Xen and KVM in the hypervisor code base deal with](https://reader035.fdocuments.us/reader035/viewer/2022062311/5e16ac99afd14279c7127e2e/html5/thumbnails/53.jpg)
53
Mdev / vfio overview
QEM
U
Guest OS
virtio-netQEM
U
Guest OS
vfio-pci
Host
OS
vfio / mdev
Can be mediated to hardwareOr just software (mtty sample)
![Page 54: Multi-process QEMU - KVM · • Both KVM and Xen use QEMU emulation (IDE, e1000) • None use the binary translation in QEMU. – Xen and KVM in the hypervisor code base deal with](https://reader035.fdocuments.us/reader035/viewer/2022062311/5e16ac99afd14279c7127e2e/html5/thumbnails/54.jpg)
54
VFIO in userspace?
Implement PCI devices in userspace with a VFIO-user?
![Page 55: Multi-process QEMU - KVM · • Both KVM and Xen use QEMU emulation (IDE, e1000) • None use the binary translation in QEMU. – Xen and KVM in the hypervisor code base deal with](https://reader035.fdocuments.us/reader035/viewer/2022062311/5e16ac99afd14279c7127e2e/html5/thumbnails/55.jpg)
55
Conclusion
● Qemu is mostly monolithic & big today● Strategies to run separate processes exist, but provide
different interfaces & integration levels● Use vhost-user for virtio devices● Many ideas for a multi-process future
![Page 56: Multi-process QEMU - KVM · • Both KVM and Xen use QEMU emulation (IDE, e1000) • None use the binary translation in QEMU. – Xen and KVM in the hypervisor code base deal with](https://reader035.fdocuments.us/reader035/viewer/2022062311/5e16ac99afd14279c7127e2e/html5/thumbnails/56.jpg)
56
Questions
![Page 57: Multi-process QEMU - KVM · • Both KVM and Xen use QEMU emulation (IDE, e1000) • None use the binary translation in QEMU. – Xen and KVM in the hypervisor code base deal with](https://reader035.fdocuments.us/reader035/viewer/2022062311/5e16ac99afd14279c7127e2e/html5/thumbnails/57.jpg)
57
STOP STOP STOP STOP STOP STOP STOP
STOPSTOP✓migrate
✗migrate
![Page 58: Multi-process QEMU - KVM · • Both KVM and Xen use QEMU emulation (IDE, e1000) • None use the binary translation in QEMU. – Xen and KVM in the hypervisor code base deal with](https://reader035.fdocuments.us/reader035/viewer/2022062311/5e16ac99afd14279c7127e2e/html5/thumbnails/58.jpg)
58
Virtio device → vhost-user device
Check ioeventfd support
vhost_dev.vqs = g_new(vhost_queues, N)
vhost_dev_init(vhost, chr, TYPE_USER, timeout)
VirtioDeviceClass.set_status() & reset():vhost_dev_enable_notifiers()
VirtioBus parent: set_guest_notifiers()
Set dev.acked_features = virtio.guest_features
vhost_dev_start()
vhost_virtqueue_mask() forall queues
![Page 59: Multi-process QEMU - KVM · • Both KVM and Xen use QEMU emulation (IDE, e1000) • None use the binary translation in QEMU. – Xen and KVM in the hypervisor code base deal with](https://reader035.fdocuments.us/reader035/viewer/2022062311/5e16ac99afd14279c7127e2e/html5/thumbnails/59.jpg)
59
QEM
U
Guest OS
virtio-net
QEM
U 2 Guest OS
vhost-pci-net
Vhost-pci WIP(Inter-VM communication)
QEM
U
Guest OS
virtio-net
QEM
U 1 Guest OS
virtio-netvhost-user
![Page 60: Multi-process QEMU - KVM · • Both KVM and Xen use QEMU emulation (IDE, e1000) • None use the binary translation in QEMU. – Xen and KVM in the hypervisor code base deal with](https://reader035.fdocuments.us/reader035/viewer/2022062311/5e16ac99afd14279c7127e2e/html5/thumbnails/60.jpg)
60
Heterogeneous QEMU
MasterMemory
QEM
U M
ast
er
IDM
VCPU X
QEMU Slave
VCPU Y
IDMSlave
Memory
“[RFC PATCH 0/8] Towards an Heterogeneous QEMU” C. Pinto Sept 2015& virtio-sdm & also xilinx remote-proc
IDM protocol