Devconf.cz 2016 Linux as a guest on Hyper-V

25
Linux as a guest on Hyper-V Vitaly Kuznetsov Red Hat DevConf 2016

Transcript of Devconf.cz 2016 Linux as a guest on Hyper-V

Page 1: Devconf.cz 2016 Linux as a guest on Hyper-V

Linux as a guest

on Hyper-V

Vitaly KuznetsovRed HatDevConf 2016

Page 2: Devconf.cz 2016 Linux as a guest on Hyper-V

2 Linux on Hyper-V

Virtualization at Red Hat

● OS-level virtualization at Red Hat:● Xen-based solutions in the past● KVM-based solutions now:

● RHEV, OpenStack

● RHEL-as-a-guest efforts:● On KVM: RHEV and OpenStack, standalone● On Xen: Amazon Web Services● On VMware: standalone● On Hyper-V: Azure and standalone

Page 3: Devconf.cz 2016 Linux as a guest on Hyper-V

3 Linux on Hyper-V

Microsoft Hyper-V

● Present since Windows Server 2008

● The core of Microsoft Azure cloud

● Hyper-V is a Type 1 hypervisor for the x86 architecture

● Requires hardware support (Intel VT-x, AMD-V)

● Emulates standard x86 platforms:● Generation 1 VM: “legacy” BIOS platform with emulated

devices● Generation 2 VM: UEFI platform without emulated

devices

Page 4: Devconf.cz 2016 Linux as a guest on Hyper-V

4 Linux on Hyper-V

Hyper-V architecture

● Full virtualization with selective enlightments:● Enlightened I/O paths

● Optional for Generation 1● Mandatory for Generation 2

● Heartbeat● Utility drivers● Time keeping and synchronization● Crash reporting● ...

Page 5: Devconf.cz 2016 Linux as a guest on Hyper-V

5 Linux on Hyper-V

Hyper-V and Linux

● Kernel drivers:● Added to staging in 2009● Out of staging in 2011● Included in all major Linux distributions: RHEL, Fedora,

OpenSUSE/SLES, Debian, Ubuntu,...

● Actively used on Azure: 25% of all VMs are Linux! (Microsoft)

Page 6: Devconf.cz 2016 Linux as a guest on Hyper-V

6 Linux on Hyper-V

Hyper-V drivers development

● Commits since 2011 (leaving staging):

2011 2012 2013 2014 20150

50

100

150

200

250

300

Commits from @microsoft.com

Commits from @redhat.com

Commits from other community members

Page 7: Devconf.cz 2016 Linux as a guest on Hyper-V

7 Linux on Hyper-V

Hyper-V drivers in Linux kernel

● Currently present drivers:● hv_storvsc (IDE/SCSI/FC storages)● hv_netvsc (network adapter)● hyperv_fb (framebuffer device)● hyperv-keyboard (keyboard)● hid-hyperv (mouse)● hv_balloon (memory ballooning and hotplug)● hv_util (utility drivers)

Page 8: Devconf.cz 2016 Linux as a guest on Hyper-V

8 Linux on Hyper-V

Hyper-V storvsc driver

● High performace storage driver

● Devices support:● SCSI● IDE (Gen1 VMs)● Fibre Channel

● Partial SPC-3 compliance since Win8/WS2012

● Full SPC-3 compliance Win10/WS2016

● Multiqueue support

Page 9: Devconf.cz 2016 Linux as a guest on Hyper-V

9 Linux on Hyper-V

Hyper-V netvsc driver

● High performace network driver

● Multiqueue● Supports scaling for RX with vRSS● Dynamic and Static VMQ for TX

● Supports batching for TX

● Decorates each outgoing packet with RNDIS header

● No NAPI support (yet)

Page 10: Devconf.cz 2016 Linux as a guest on Hyper-V

10 Linux on Hyper-V

Hyper-V netvsc performance (Microsoft data)

1 8 64 256 1024 60000.05.0

10.015.020.025.030.035.0

4.1

25.028.3 26.9

23.4

15.5

5.6

20.7

30.6 31.325.2

10.0

On Local HyperV

WS2012R2 Linux

Number Of Connections

Thro

ughp

ut (G

bps)

Note: Server VM CPU: 8 vCPUs of E5-2690 @2.90GHz, on one NUMA node

Page 11: Devconf.cz 2016 Linux as a guest on Hyper-V

11 Linux on Hyper-V

Hyper-V netvsc performance (Microsoft data)

1 8 64 256 1024 60000.05.0

10.015.020.025.030.0

2.3

20.624.2 23.3

15.89.9

4.1

15.321.3

17.114.0 12.7

On Azure G5

WS2012R2 Linux

Number Of Connections

Thro

ughp

ut (G

bps)

Note: VM CPU: 32 vCPUs of E5-2698B v3 @ 2.00GHz, on two NUMA nodes

Page 12: Devconf.cz 2016 Linux as a guest on Hyper-V

12 Linux on Hyper-V

Hyper-V utility drivers and daemons

● 'Internal' drivers● Clocksources, clockevents● Time synchronization● Heartbeat

● Paired: kernel driver + userspace daemon● hv_kvp – key/value pair exchange (network settings)● hv_vss – freeze/thaw file systems for backup● hv_fcopy – copy an arbitrary file from the host to the

guest

Page 13: Devconf.cz 2016 Linux as a guest on Hyper-V

13 Linux on Hyper-V

Memory ballooning and hotplug

● Post memory pressure reports to the host every second.

● “balloon up” request from the host:● Allocate pages and send their PFNs to the host so

actual pages behind these frames can be reused.● “balloon down” request from the host:

● Get PFNs from the host, de-allocate pages.

● Memory hotplug:● Initiated by the host, 2M granularity (128M in Linux)● Possible with 'Dynamic memory' disabled in WS2016

Page 14: Devconf.cz 2016 Linux as a guest on Hyper-V

14 Linux on Hyper-V

Timekeeping

● TSC exists but not very reliable

● hv_clocksource:● MSR-based● Stable but slow

● TSC PAGE clocksource● Reading from a shared memory page● Fast as there is no exit to the hypervisor

Page 15: Devconf.cz 2016 Linux as a guest on Hyper-V

15 Linux on Hyper-V

Hyper-V drivers in development

● Hvsock● Userspace-to-userspace communications through

VMBUS● Similar to VSOCK

● PCI passthrough

● RDMA● Open-source but not upstream yet

Page 16: Devconf.cz 2016 Linux as a guest on Hyper-V

16 Linux on Hyper-V

Linux on Hyper-V internals

● Why do we need Hyper-V-specific drivers?● Emulating real hardware is SLOW, other hypervisors

have their drivers in kernel too:● KVM: virtio● Xen: blkfront, netfront, balloon, …● Vmware: pvscsi, vmxnet3, ..

● Some devices don't have hardware counterparts:● Utility drivers● Memory ballooning

Page 17: Devconf.cz 2016 Linux as a guest on Hyper-V

17 Linux on Hyper-V

“Enlightened drivers”

● The core is VMBUS● Protocol for guest ↔ host communication● Based on a concept of “channels”

● Primary/secondary channels for devices● Each channel is bound to a VCPU● Channels don't block each other

● Guest → Host signalling by hypercalls● Host → Guest signalling by interrupts for “events” or

“messages”● Ring buffers for data exchange

Page 18: Devconf.cz 2016 Linux as a guest on Hyper-V

18 Linux on Hyper-V

Hypercalls

● The mechanism to signal something to the host

● A single 4k page per guest

● Setup:● virtual mapping within the guest● Physical address → HV_X64_MSR_HYPERCALL

● Usage:● Do function-like call to the page (call id, input addr,

output addr)

Page 19: Devconf.cz 2016 Linux as a guest on Hyper-V

19 Linux on Hyper-V

Host→Guest signalling: “Messages”

● A magical page per-VCPU

● … which contains actual data

● One message at a time

● Message's payload is <= 30 QWORDS

● Used mainly for setup/teardown pathes (channels offers, open/close, unload,…)

● Clockevents also use messages.

Page 20: Devconf.cz 2016 Linux as a guest on Hyper-V

20 Linux on Hyper-V

Host→Guest signalling: “Events”

● A (different) magic page per-VCPU

● … with an indication that there's pending data on a particular channel.

● Each channel has its own bit so all channels assigned to the same vCPU which need processing are signalled with a single interrupt.

● An event means “go check the ring buffer” and that's where the actual data is.

Page 21: Devconf.cz 2016 Linux as a guest on Hyper-V

21 Linux on Hyper-V

Ring buffers

● Data transfer mechanism based on shared memory for performance-critical devices

Page 22: Devconf.cz 2016 Linux as a guest on Hyper-V

22 Linux on Hyper-V

Ring buffers for channels

● Two separate rings for each channel for guest → host and host → guest communication.

● Different ring sizes for different drivers:● Netvsc – 128 pages● Storvsc – 256 pages● ...

● Need for signalling both ways.

Page 23: Devconf.cz 2016 Linux as a guest on Hyper-V

23 Linux on Hyper-V

Receive ring signalling

● We receive an interrupt indicating there are events pending.

● We scan the event page to see which channels have new data.

● For a particular channel with a pending event we read and process all the data on the ring buffer.

● We advance read pointer freeing space on the buffer.

● If the host was blocked by the absence of space on the ring we signal it when we're done reading.

Page 24: Devconf.cz 2016 Linux as a guest on Hyper-V

24 Linux on Hyper-V

Transmit ring signalling

● Host guarantees to drain the buffer on each read operation.

● Host sets interrupt mask to signal an ongoing read.

● We signal the host when the ring transfers from empty to non-empty state and the host is not currently reading.

● … but we can also delay signalling if more data is on the way.

Page 25: Devconf.cz 2016 Linux as a guest on Hyper-V

Thank you!Questions?

Vitaly [email protected]