Services in the Virtualization Plane
Andrew Warfield
Adjunct Professor, UBC
Technical Director, Citrix Systems
The Virtualization Plane
Physical Machine
OS
Applications Applications
• Geoff’s call graph goes on this slide.
20ms in the Linux Kernel• 355,000 Branches• 350 Syscalls, 312 Interrupts, 255 PFs• And this is less than 10% of whathappened in that 20ms!
20ms in the Linux Kernel• 355,000 Branches• 350 Syscalls, 312 Interrupts, 255 PFs• And this is less than 10% of whathappened in that 20ms!
The Virtualization Plane
Virtual Machine Monitor
OS
Applications Applications
Physical Machine
The Virtualization Plane
• Huge opportunity for innovation.• OS agnostic and hardware agnostic.• Build useful services that are co‐located, but isolated from VMs.
• Live migration was the first example.
Virtual Machine Monitor
VMVM VMVM
Physical Machine
Virtual Machine Monitor
VMVM VMVM
Physical Machine
Virtual Machine Monitor
VMVM VMVM
Physical Machine
ApplianceAppliance ApplianceAppliance ApplianceAppliance ApplianceAppliance ApplianceAppliance ApplianceAppliance
Virtualization Plane
Overview
REMUS: TRANSPARENT HIGH AVAILABILITY
Graduate Student: Brendan Cully
Example 1: (Brendan Cully)Remus: Transparent High Availability
• As with process migration, HA is complicated and difficult to maintain.
• Database HA is generally based around a replicated log, and a recovery protocol based around detailed application semantics.
• HA is for “the very rich and the very scared.”
• Idea: Use simple mechanisms in the virtualization layer to provide universal HA.
Remus
Xen
Mail ServerVM PVM
Xen
Mail ServerVMPVM
3ms
<17ms
Remus
Xen
Mail ServerVM PVM
Xen
Mail ServerVMPVM
3ms
<17ms
InternetInternet
“checkpoint ok!”
Remus
Xen
Mail ServerVM PVM
Xen
Mail ServerVMPVM
InternetInternet
Remus demonstrates that efficient and complete state capture can provide hardware fault tolerant‐
style whole‐system failover to unmodifiedapplications.
The simplicity of the approach is critical, because high availability and fault recovery code is
notoriously difficult to get right.
Remus demonstrates that efficient and complete state capture can provide hardware fault tolerant‐
style whole‐system failover to unmodifiedapplications.
The simplicity of the approach is critical, because high availability and fault recovery code is
notoriously difficult to get right.
Remus: Current Work
• Disaster Tolerant Computing.Extend HA/FT to work in the wide area. Deployment between UBC and TRU~350km fiber connection.
• Exposing Remus to Applications.Apply paravirtualization to transparent HA.E.g. let a database know that some memory is unprotected
Remus: Summary
• Published at NSDI 2008 – won “Best Paper”award.
• Some patches in xen‐unstable, remaining patches to appear over the next month or two.
• This summer, added support for HVM guests.
PARALLAX: VIRTUAL STORAGE FOR VIRTUAL MACHINES
Graduate Student: Dutch Meyer
Parallax: Storage Virtualization for VMs
• VMs are fantastic, but turn out to be a bit clunky to work with.
• VM images are really big, and most storage systems don’t really provide the operations you want to really innovate with VMs.
• Horizontal scale: create lots of images based on a gold master.
• Vertical Scale: Lots of snapshots of a single image.• Thin provisioning is critical, as is low‐cost storage.• Parallax is basically just page tables for disks!
Parallax: Storage Virtualization for VMs
Parallax
metadata
data
•Virtualizes block devices using address mapping trees.•Very low overhead (2ms) snapshots.•Equally low overhead image cloning.•Space efficient.•Many very interesting challenges.
•Virtualizes block devices using address mapping trees.•Very low overhead (2ms) snapshots.•Equally low overhead image cloning.•Space efficient.•Many very interesting challenges.
Parallax Summary
• Parallax is used on a daily basis in our lab.
• Vertical integration of storage from array through to guest.
• Horizontal integration across hosts in a cluster.
• Storage is no longer a barrier for deploying new VMs.
• In the process of adding powerful new features: deduping, linearization, and CAS.
TRALFAMADORE: ENHANCING AND UNDERSTANDING SYSTEMS
Graduate Students: Geoffrey Lefebvre and Brendan Cully
Tralfamadore: Motivation
• How much do we really know about what software is doing, especially when things go wrong?
• What if we had a detailed recording?
• What if the recording was interactive and could be queried and changed?
Tralfamadore
Production Network
Test/Dev Network1. Continuously log execution for long periods of time.1. Continuously log execution for long periods of time.
Remus Checkpoints
Remus Checkpoints
ParallaxParallax
RAMRAM
DiskDisk
RAMRAM
DiskDisk
RAMRAM
DiskDisk
RAMRAM
DiskDisk
RAMRAM
DiskDisk
RAMRAM
DiskDisk
Tralfamadore
Production Network
Test/Dev Network2. Re‐execute slices of history to generate indexes.2. Re‐execute slices of history to generate indexes.
Remus Checkpoints
Remus Checkpoints
ParallaxParallax
RAMRAM
DiskDisk
RAMRAM
DiskDisk
RAMRAM
DiskDisk
RAMRAM
DiskDisk
RAMRAM
DiskDisk
RAMRAM
DiskDisk
ExecutionIndex
ExecutionIndex Indexing Servers
Tralfamadore
Production Network
Test/Dev Network3. Queries to search, select, and interrogate history.3. Queries to search, select, and interrogate history.
Remus Checkpoints
Remus Checkpoints
ParallaxParallax
RAMRAM
DiskDisk
RAMRAM
DiskDisk
RAMRAM
DiskDisk
RAMRAM
DiskDisk
RAMRAM
DiskDisk
RAMRAM
DiskDisk
ExecutionIndex
ExecutionIndex Indexing Servers
Query Servers
“Find points in execution when…”• “…/etc/passwd was modified”• “…network receive buffers were overloaded.”• “… the stack looked like this.”• “… control flow ran function b, shortly after running function a.”
“Find points in execution when…”• “…/etc/passwd was modified”• “…network receive buffers were overloaded.”• “… the stack looked like this.”• “… control flow ran function b, shortly after running function a.”
Tralfamadore
Production Network
Test/Dev Network4. Re‐execute and analyze modified checkpoints.4. Re‐execute and analyze modified checkpoints.
Remus Checkpoints
Remus Checkpoints
ParallaxParallax
RAMRAM
DiskDisk
RAMRAM
DiskDisk
RAMRAM
DiskDisk
RAMRAM
DiskDisk
RAMRAM
DiskDisk
RAMRAM
DiskDisk
ExecutionIndex
ExecutionIndex Indexing Servers
Query Servers Re‐execution Servers
Use piles of existing tools:• Emulators,• Profilers,•Binary Patching,•Debuggers
Use repeated re‐execution to tackle non‐determinism.
Use piles of existing tools:• Emulators,• Profilers,•Binary Patching,•Debuggers
Use repeated re‐execution to tackle non‐determinism.
Applications of Tralfamadore
• Fault injection / fuzzing of software close to real execution experience.• Retrospectively attach a debugger to any node in this call graph.• Test patches in the past!
• Fault injection / fuzzing of software close to real execution experience.• Retrospectively attach a debugger to any node in this call graph.• Test patches in the past!
Understanding Execution
• Prototype application of Tralfamadore performs dynamic analysis of trace data and maps it back on top of source code.
• Allows developers to understand how source is behaving in deployments.
• Very early work, but a few examples follow…
Tralfamadore Summary
• We hope to be able to do detailed, retrospective analysis of system behaviour.
• Current focus has been on understanding execution.
• Future work will involve performance and security analysis and assisting reproduction of system failures.
Overall Conclusions
• The virtualization plane presents a great opportunity to build low‐level extensions to software.
• I have shown three example services, and expect many more to follow.
• Interesting challenges exist in evolving virtualization to provide these services while maintaining isolation and cross‐platform benefits.
Thank You!
Top Related