Live Migration of Direct-Access Devices
Transcript of Live Migration of Direct-Access Devices
![Page 1: Live Migration of Direct-Access Devices](https://reader034.fdocuments.us/reader034/viewer/2022042518/55ab0cd11a28ab980c8b4709/html5/thumbnails/1.jpg)
Live Migration of Direct-Access Devices
Asim Kadav and Michael M. Swift
University of Wisconsin - Madison
![Page 2: Live Migration of Direct-Access Devices](https://reader034.fdocuments.us/reader034/viewer/2022042518/55ab0cd11a28ab980c8b4709/html5/thumbnails/2.jpg)
Live Migration
• Migrating VM across different hosts without noticeable downtime
• Uses of Live Migration
– Reducing energy consumption by hardware consolidation
– Perform non-disruptive hardware maintenance
• Relies on hypervisor mediation to maintain connections to I/O devices
– Shared storage
– Virtual NICs
2
![Page 3: Live Migration of Direct-Access Devices](https://reader034.fdocuments.us/reader034/viewer/2022042518/55ab0cd11a28ab980c8b4709/html5/thumbnails/3.jpg)
Live Migration with virtual I/O
Hardware Device
Hypervisor + OS
Guest OS
Virtual Driver
Hardware Device
Hypervisor + OS
VMM abstracts the hardware
Device state available to hypervisor.
3
Source Host Destination Host
![Page 4: Live Migration of Direct-Access Devices](https://reader034.fdocuments.us/reader034/viewer/2022042518/55ab0cd11a28ab980c8b4709/html5/thumbnails/4.jpg)
Access to I/O Devices
Physical Hardware Device
Guest OSVirtual Driver
HypervisorHypervisor + OS
Virtual I/O Architecture
Virtual I/O
• Guest accesses virtual devices• Drivers run outside guest OS• Hypervisor mediates all I/O
Moderate performance Device independence Device sharing Enables migration
4
![Page 5: Live Migration of Direct-Access Devices](https://reader034.fdocuments.us/reader034/viewer/2022042518/55ab0cd11a28ab980c8b4709/html5/thumbnails/5.jpg)
Direct access to I/O Devices
Hypervisor
Guest OS
Direct I/O Architecture (Pass-through I/O)
Direct I/O• Drivers run in Guest OS• Guest directly accesses
device
Near native performance No migration
Real Driver
Physical Hardware Device
5
![Page 6: Live Migration of Direct-Access Devices](https://reader034.fdocuments.us/reader034/viewer/2022042518/55ab0cd11a28ab980c8b4709/html5/thumbnails/6.jpg)
Live Migration with Direct I/O
Hypervisor + OS Hypervisor + OS
Device State lost
No Heterogeneous Devices 6
Guest OS
Real Driver
Source Host Destination Host
Hardware Device Hardware Device
![Page 7: Live Migration of Direct-Access Devices](https://reader034.fdocuments.us/reader034/viewer/2022042518/55ab0cd11a28ab980c8b4709/html5/thumbnails/7.jpg)
Live migration with Direct I/O
• Why not both performance and migration?
– Hypervisor unaware of device state
– Heterogeneous devices/drivers at source and destination
• Existing Solutions :
– Detach device interface and perform migration [Xen 3.3]
– Detach device and divert traffic to virtual I/O [Zhai OLS 08]
– Modify driver and device [Varley (Intel) ISSOO3 08]
7
![Page 8: Live Migration of Direct-Access Devices](https://reader034.fdocuments.us/reader034/viewer/2022042518/55ab0cd11a28ab980c8b4709/html5/thumbnails/8.jpg)
Overview
• Problem– Direct I/O provides native throughput– Live migration with direct I/O is broken
• Solution– Shadow drivers in guest OS capture device/driver state– Transparently re-attach driver after migration
• Benefits– Requires no modifications to the driver or the device– Supports migration of/to different devices– Causes minimal performance overhead
8
![Page 9: Live Migration of Direct-Access Devices](https://reader034.fdocuments.us/reader034/viewer/2022042518/55ab0cd11a28ab980c8b4709/html5/thumbnails/9.jpg)
Outline
• Introduction
• Architecture
• Implementation
• Evaluation
• Conclusions
9
![Page 10: Live Migration of Direct-Access Devices](https://reader034.fdocuments.us/reader034/viewer/2022042518/55ab0cd11a28ab980c8b4709/html5/thumbnails/10.jpg)
Architecture
• Goals for Live Migration
– Low performance cost when not migrating
– Minimal downtime during migration
– No activity executing in guest pre-migration
• Our Solution
– Introduce agent in guest OS to manage migration
– Leverage shadow drivers as the agent [Swift OSDI04]
10
![Page 11: Live Migration of Direct-Access Devices](https://reader034.fdocuments.us/reader034/viewer/2022042518/55ab0cd11a28ab980c8b4709/html5/thumbnails/11.jpg)
Shadow Drivers
• Kernel agent that monitors the state of the driver
• Recovers from driver failures
• Driver independent
• One implementation per device type
Guest OS Kernel
Shadow Driver
Device Driver
Device
11
Taps
![Page 12: Live Migration of Direct-Access Devices](https://reader034.fdocuments.us/reader034/viewer/2022042518/55ab0cd11a28ab980c8b4709/html5/thumbnails/12.jpg)
Shadow Driver OperationGuest OS
Kernel
Shadow Driver
Device Driver
Device
• Normal Operation
- Intercept Calls
- Track shared objects
- Log state changing operations
• Recovery
- Proxy to kernel
- Release old objects
- Restart driver
- Replay log
12
Taps
LogTracker
![Page 13: Live Migration of Direct-Access Devices](https://reader034.fdocuments.us/reader034/viewer/2022042518/55ab0cd11a28ab980c8b4709/html5/thumbnails/13.jpg)
Shadow Drivers for Migration
• Pre Migration– Record driver/device state in driver-independent
way
– Shadow driver logs state changing operations• configuration requests, outstanding packets
• Post Migration– Unload old driver
– Start new driver
– Replay log to configure driver
13
![Page 14: Live Migration of Direct-Access Devices](https://reader034.fdocuments.us/reader034/viewer/2022042518/55ab0cd11a28ab980c8b4709/html5/thumbnails/14.jpg)
Shadow Drivers for Migration
• Transparency
– Taps route all I/O requests to the shadow driver
– Shadow driver can give an illusion that the device is up
• State Preservation
– Always store only the absolute current state
– No history of changes maintained
– Log size only dependent on current state of the driver
14
![Page 15: Live Migration of Direct-Access Devices](https://reader034.fdocuments.us/reader034/viewer/2022042518/55ab0cd11a28ab980c8b4709/html5/thumbnails/15.jpg)
Migration with shadow drivers
Guest OS Kernel
Shadow Driver
Source Network
Driver
Source Network Card
Destination Network
Driver
Dest. Network Card
Taps
LogTracker
Source Hypervisor + OS
Destination Hypervisor + OS
15
Source Network
Driver
![Page 16: Live Migration of Direct-Access Devices](https://reader034.fdocuments.us/reader034/viewer/2022042518/55ab0cd11a28ab980c8b4709/html5/thumbnails/16.jpg)
Outline
• Introduction
• Overview
• Architecture
• Implementation
• Evaluation
• Conclusions
16
![Page 17: Live Migration of Direct-Access Devices](https://reader034.fdocuments.us/reader034/viewer/2022042518/55ab0cd11a28ab980c8b4709/html5/thumbnails/17.jpg)
Implementation
• Prototype Implementation
– VMM: Xen 3.2 hypervisor
– Guest VM based on linux-2.6.18.8-xen kernel
• New code: Shadow Driver implementation inside guest OS
• Changed code: Xen hypervisor to enable migration
• Unchanged code: Device drivers
17
![Page 18: Live Migration of Direct-Access Devices](https://reader034.fdocuments.us/reader034/viewer/2022042518/55ab0cd11a28ab980c8b4709/html5/thumbnails/18.jpg)
Changes to Xen Hypervisor
• Migration code modified to allow
– Allow migration of PCI devices
– Unmap I/O memory mapped at the source
– Detach virtual PCI bus just before VM suspension at source and reconnect virtual PCI bus at the destination
– Added ability to migrate between different devices
18
XenHypervisor
Guest VM
PCI Bus
![Page 19: Live Migration of Direct-Access Devices](https://reader034.fdocuments.us/reader034/viewer/2022042518/55ab0cd11a28ab980c8b4709/html5/thumbnails/19.jpg)
Modifications to the Guest OS
• Ported shadow drivers to 2.6.18.8-xen kernel– Taps
– Object tracker
– Log
• Implemented shadow driver for network devices– Proxy by temporarily disabling device
– Log ioctl calls, multicast address
– Recovery code
19
![Page 20: Live Migration of Direct-Access Devices](https://reader034.fdocuments.us/reader034/viewer/2022042518/55ab0cd11a28ab980c8b4709/html5/thumbnails/20.jpg)
Outline
• Introduction
• Architecture
• Implementation
• Evaluation
• Conclusions
20
![Page 21: Live Migration of Direct-Access Devices](https://reader034.fdocuments.us/reader034/viewer/2022042518/55ab0cd11a28ab980c8b4709/html5/thumbnails/21.jpg)
Evaluation
1. Cost when not migrating
2. Latency of Migration
21
![Page 22: Live Migration of Direct-Access Devices](https://reader034.fdocuments.us/reader034/viewer/2022042518/55ab0cd11a28ab980c8b4709/html5/thumbnails/22.jpg)
Evaluation Platform
• Host machines– 2.2GHz AMD machines
– 1 GB Memory
• Direct Access Devices– Intel Pro/1000 gigabit Ethernet NIC
– NVIDIA MCP55 Pro gigabit NIC
• Tests– Netperf on local network
– Migration with no applications inside VM
– Liveness tests from a third physical host
22
![Page 23: Live Migration of Direct-Access Devices](https://reader034.fdocuments.us/reader034/viewer/2022042518/55ab0cd11a28ab980c8b4709/html5/thumbnails/23.jpg)
Throughput
698 706.2773
940.5
769
937.7
0
200
400
600
800
1000
Intel Pro/1000 NVIDIA MCP55
Virtual I/O
Direct I/O
Direct I/O (SD)
Thro
ugh
pu
t in
Mb
its/
seco
nd
23
Direct I/O is 33% better than virtual I/O.
Direct I/O with shadow throughput within 1% of original
![Page 24: Live Migration of Direct-Access Devices](https://reader034.fdocuments.us/reader034/viewer/2022042518/55ab0cd11a28ab980c8b4709/html5/thumbnails/24.jpg)
CPU Utilization
13.80%
17.70%
2.80%
8.03%
4.00%
9.16%
0%
5%
10%
15%
20%
Intel Pro/1000 NVIDIA MCP55
Virtual I/O
Direct I/O
Direct I/O (SD)
CP
U U
tiliz
atio
n in
th
e gu
est
VM
24Direct I/O with shadow CPU utilization within 1% of original
![Page 25: Live Migration of Direct-Access Devices](https://reader034.fdocuments.us/reader034/viewer/2022042518/55ab0cd11a28ab980c8b4709/html5/thumbnails/25.jpg)
Migration Time
1 2 3 4
Events Timeline
PCI Unplug
VM Migration
PCI Replug
Shadow Recovery
Driver Uptime
ARP Sent
Time (in seconds)
Significant migration latency(56%) is in driver uptime.
25
G
H1 H2
D
Total Downtime = 3.97 seconds
Events TimeLine
![Page 26: Live Migration of Direct-Access Devices](https://reader034.fdocuments.us/reader034/viewer/2022042518/55ab0cd11a28ab980c8b4709/html5/thumbnails/26.jpg)
Conclusions
• Shadow Drivers are an agent to perform live migration of VMs performing direct access
• Supports heterogeneous devices
• Requires no driver or hardware changes
• Minimal performance overhead and latency during migration
• Portable to other devices, OS and hypervisors
26
![Page 27: Live Migration of Direct-Access Devices](https://reader034.fdocuments.us/reader034/viewer/2022042518/55ab0cd11a28ab980c8b4709/html5/thumbnails/27.jpg)
Questions
27
Contact : {kadav, swift} @cs.wisc.edu
More details:
http://cs.wisc.edu/~swift/drivers/
http://cs.wisc.edu/~kadav/
![Page 28: Live Migration of Direct-Access Devices](https://reader034.fdocuments.us/reader034/viewer/2022042518/55ab0cd11a28ab980c8b4709/html5/thumbnails/28.jpg)
Backup Slides
28
![Page 29: Live Migration of Direct-Access Devices](https://reader034.fdocuments.us/reader034/viewer/2022042518/55ab0cd11a28ab980c8b4709/html5/thumbnails/29.jpg)
Complexity of Implementation
• ~19000 LOCs
• Bulk of this (~70%) are wrappers around functions.
– Can be automatically generated by scripts
29
![Page 30: Live Migration of Direct-Access Devices](https://reader034.fdocuments.us/reader034/viewer/2022042518/55ab0cd11a28ab980c8b4709/html5/thumbnails/30.jpg)
Migration Time
1 2 3 4
Events Tim
eline
PCI Unplug
VM Migration
PCI Replug
Shadow Recovery
Driver Uptime
ARP Sent
Time (in seconds)
Total downtime = 3.97 seconds
Significant migration latency(56%) is in driver uptime.
30
G
H1 H2
D
![Page 31: Live Migration of Direct-Access Devices](https://reader034.fdocuments.us/reader034/viewer/2022042518/55ab0cd11a28ab980c8b4709/html5/thumbnails/31.jpg)
Migration with shadow drivers
Guest OS Kernel
Shadow Driver
Source Network
Driver
Source Network Card
Dest. Network
Driver
Destination Network Card
Taps
31
LogTracker
Source Hypervisor
+ OS
Destination Hypervisor
+ OS