Computer Architecture course lecture on Virtualization and ...

21
Virtualization Susanta K. Nanda ECSL CSE 502, Fall’05

description

 

Transcript of Computer Architecture course lecture on Virtualization and ...

  • 1. Virtualization Susanta K. Nanda ECSL CSE 502, Fall05

2. Virtualization at the Hardware Level

  • Observation
    • Hardware resources aretypicallyunder-utilized
    • Hardware resources directly relate to cost
  • Goal: Improve hardware utilization
  • How?
    • Share hardware resources across multiple machines
    • May make sense for network attached storage, but what about processor, memory, etc.?
  • Theme
    • Decouplemachine from hardware
  • Virtual Machine (VM)
    • A machine decoupled from the hardware, i.e. does not necessarily correspond to the hardware
    • Multiple Virtual Machines on the same physical host could share the underlying hardware
    • First VM: IBM System/360 Model 40 VM [1965]

3. Virtual Machine Monitor (VMM)

  • A thin layer of software on top of the bare machine to facilitate virtualization of hardware resources
  • Mediates between VMs and the hardware
  • Manages VMs
    • Create, Destroy, Power Off/On, etc.
  • Concerns
    • Isomorphism : State transitions must be isomorphic to a physical nachine
    • Isolation : One VM from all others
    • Performance : Close-to-native
    • Correctness : Exactly same hardware interface to the guest OS to support commodity OSes without any modification

4. A Stolen Picture 5. VM: Additional Advantages

  • Non-existing hardware
    • Virtual devices through emulation via a combination of software and other available devices
    • Example: SCSI-disk using IDE-disks, (virtual) timer
    • Use: Legacy systems/software
  • Hides heterogeneity of the underlying hardware
    • Ability to switch hardware vendors
  • Mobility
    • Decoupling helps move a VM from one physical host to another, just as a file
    • Use: Server consolidation, hardware maintenance, etc.
  • OS Debugging, Mixed OS, Event monitoring, Execution Undo, and Many more

6. Key Concepts: Appearance

  • A VM consists of Shared and Dedicated Hardware
    • Shared: Disk, Memory, NIC, CPU, Printer, etc
    • Dedicated: Keyboard, Mouse, Display, Speakers, CD-Drive, etc
    • A server VM may not require some dedicated devices
  • Dedicated hardware
    • PerUser
    • Sharable across multiple VMs if they belong to the same user

7. Key Concepts: State Management

  • Each VM would have itsownarchitected state information
    • Example: registers/memory/disks, page table/TLB
  • Not always possible to map all architected states to its natural level in the host
    • Insufficient/Unavailable host resources
    • Example: Registers of a VM may be architected using main memory in the host
  • VMs keep getting switched in/out by the VMM
    • Isomorphism requires all state transitions to be performed on the VM states
    • Performance requires efficient state management
  • State Management:IndirectionVs.Copying

8. Key Concepts: State Managementcontd

  • Indirection
    • Holdstatefor each VM in fixed locations in the hosts memory hierarchy
    • Apointermanaged by VMM indicating the guest state that is currentlyactive
    • Example: Register block maintained in memory and a processor register pointing to the register block of the currently active VM
    • Pros: Ease of management
    • Cons: Inefficient ( mov eax ebxrequires 2 inst)
  • Copying
    • Copy VMs state information to its natural level in memory hierarchy whenswitched in
    • Copy them back to the original place whenswitched out
    • Example: Copy all the VM registers to the processor registers
    • Pros: Efficient (most instructions are executed natively)
    • Cons: Copying overhead

9. Key Concepts: Resource Control

  • VMM must maintainoverall controlof the hardware resources
    • Hardware resources are assigned to VMs when they are created/executed
    • Should have a way to get them back when they need to assigned to a different VM
    • Similar to multi-programming in OS
  • Privileged Resources
    • Certain resources are accessible only to and managed by VMM
    • Interrupts relating to such resources must then be handled by VMM
    • Privileged resources are emulated by VMM for the VM
    • Example : interval timer
  • All resource that could help maintain control are marked privileged
    • Interval timer is used to decide VM scheduling
    • Page table base register (CR3 on x86) is used to isolate VM memory
  • Issues: VM scheduling (An ideallyfairscheduling may not be good)

10. Key Concepts: Native/Hosted VMs

  • Native VMs
    • VMM is installed on the bare machine, no host OS
    • All other VMs are then created through the VMM
    • Pros: Clean Architecture, Efficient
    • Cons: Complicated VMM due to device drivers
    • Example: VMware ESX Server
  • Hosted VMs
    • VMM is installed on top of a host OS
    • User-mode: VMM runs in non-privileged mode
    • Dual-mode: VMM runs partly in privileged mode (as a driver on the host OS) and partly in unprivileged mode (like an application)
    • Pros: VMM uses drivers in the host OS for I/OThin VMM
    • Cons: Inefficient for I/O intensive applications
    • Example: Microsoft Virtual Server

11. Processor Virtualization

  • Privilege Levels/Rings
    • System/User mode
  • System ISA vs. User ISA
  • Emulation
    • Guest ISA may differ from Host ISA
    • Binary translation
    • Slower
  • Native Execution
    • Guest and Host ISA must be the same
    • Some critical instructions may still need to be emulated
    • Issues: Complexity of discovering and emulatingcriticalinstructions efficiently

12. ISA Virtualizability

  • Privileged Instructions (PI)
    • Instructions that generate a trap when executed in any but most-privileged level
    • Example: LIDT (load interrupt descriptor table)
  • Sensitive Instructions (SI)
    • Instructions whose behavior depends on the current privilege level
    • Example: POPF (pops the stack to EFLAGS)
      • In user mode, the Interrupt Enable bit of the ELAGS register is not over-written
      • In system mode, the value is blindly copied
  • Popek/Goldberg Theorem
    • For any conventional third-generation computer, a virtual machine monitor may be constructed if the set ofsensitive instructionsfor that computer is a subset of the set ofprivileged instructions .
    • In other words, ISA is Virtualizable if and only if SI is a subset of PI

13. When ISA is not Virtualizable?

  • All is not lost if an ISA violates Popek/Goldberg theorem
    • However, it brings in additional complications and inefficient in VMM implementation
  • Critical instructions:
    • Instructions that are sensitive but not privileged
    • X86 has 17 critical instructions
    • All critical instructions must be emulated by VMM
  • VMM Components
    • Binary Scanner: Inspects and inserts trap at critical instructions
    • Dispatcher: Gets control when a trap occurs
    • Allocator: Allocates machine resources (e.g. load relocation bounds register)
    • Interpreters: Each interpreter interprets one privileged instruction

14. Memory Virtualization

  • VM support in traditional architectures
    • Architected TLB vs. Architected Page Table
    • Page-fault and Swap
    • One level of indirection: Page Table
  • VMM requires two levels of indirection
    • Virtual Memory to Real Memory: Page Table (Guest OS)
    • Real Memory to Physical Memory: Real Map Table (VMM)
  • Architected Page Table
    • Additional Data Structures
      • Real Map Table (VMM)
      • Shadow Page Tables (VMM): Used by hardware for address translation, directly maps virtual address to physical (not real) address
    • Maintenance:
      • VMM intercepts and emulates Page table modifications, Page table base register modifications by the Guest OS

15. Memory Virtualizationcontd

  • Architected TLB
    • Virtual TLB: maintained by guest OS
      • Virtual ASID, Virtual Page, Real Page
    • Real TLB: maintained by VMM
      • Real ASID, Virtual Page, Physical Page
    • ASID map table
      • Virtual ASID, Real ASID
    • VMM intercepts/emulates all modifications to TLB by the guest OS

16. I/O Virtualization

  • Virtualizing Devices
    • Dedicated Devices: Display, Keyboard, Mouse, etc.
    • Partitioned Devices: Disk
    • Shared Devices: Network adapter
    • Spooled Devices: Printer
    • Non-existent Physical Devices: virtual network adapter
  • Virtualizing I/O Operations
    • Intercepting/emulating IN/OUT, INS/OUTS
    • Map virtual resource ID to physical device ID
    • De-multiplexing the interrupts for the devices
  • Virtualizing I/O in Hosted VMM
    • VMM-driver translates I/O instructions back to system calls in the host OS

17. Performance Degradation in VMMs

  • Setup: VM State initialization
  • Emulation: Emulatingcriticalinstructions
  • Interrupt Handling
    • Interrupts generated by a program within a VM has to be first handled by VMM even though its not required sometimes
  • State Saving: During world switches
  • Bookkeeping: Timers, etc
  • Time Elongation: Memory references take longer

18. VT-x: Vanderpool Technology

  • VMX Mode for Processors
    • VMX Root and VMX Non-root
    • All four privilege level (rings) are available in both root and non-root in VMX mode
      • Thus, four new less privilege levels than Pentiums
    • Guest VMs can run in VMX non-root
    • Host (Hosted VMM) and VMM in VMX root
  • VMX instructions
    • VMX root has access to a new set of instructions
    • Critical shared resources are kept under the control of a monitor in VMX root
    • VMX non-root ring 0 does not have access to the critical resources
    • An example of a critical resource: Memory for state management

19. An Example Operation

  • VMXON:Switch into VMX mode: To VMM
  • VMLAUNCH VM1 : Start executing VM1 in VMX non-root operation
  • VM1 Exits: Go back to VMM
  • VMLAUNCH VM2:Start executing VM2
  • VM2 Exits: Go back to VMM
  • VMRESUME VM2:Switch to VM2 again
  • VM2 Exits: Go back to VMM
  • VMRESUME VM2 : Switch to VM2
  • VMRESUME VM1:VM2 exits, VM1 switched in
  • VM1 exits:Go back to VMM
  • VMXOFF : Get back to Regular mode

20. Maintenance of State

  • VMCS Data Structure
    • Fully specified, various fields defined
    • Manipulatedonlyby hardware or software in VMX-root
    • VMPTR points to the VMCS structure of the current executing VM
    • There can be multiple VMs active at any point, but one of them would be executing
    • VMWRITE/VMREAD to read contents of VMCS
    • State: More than normal, e.g. architecturally hidden part of segment registers
  • Control Fields: Define under what condition a VM exits
    • Example: Some specific interrupt/instruction/etc, number of model-specific registers (MSRs) that need to be saved when VM exits
  • VM exit info
    • Informs the VMM the reason for exit along with supporting info

21. Maintenance of Statecontd

  • State Area
    • Guest State: Register state, Interruptibility state
    • Host State: Register State
  • Control Area
    • VM Execution Controls
      • Pin/Processor-based execution controls, bitmap fields, etc
    • VM Exit Controls
      • Control bitmap, MSR Controls
    • VM Entry Controls
      • Control bitmap, MSR Controls, Controls for Event Injection
  • VM Exit Information
    • Basic Info: VM-Exit Info, Vectoring Event Info
    • Other Exit Info: Due to event delivery, due to instruction execution