2012/11/28 2
Agenda
• Introduction history
Usage model
•Virtualization overview cpu virtualiztion
memory virtualization
I/O virtualization
•Xen/KVM architecture Xen
KVM
•Some intel work for Openstack OAT
2012/11/28 3
Virtualization history
• 60s’ IBM - CP/CMS on S360, VM370, …
• 70’s 80s’ Silence
• 1998 VMWare - SimOS project, Stanford
• 2003 Xen - Xen project, Cambridge
• After that: KVM/Hyper-v/Parallels …
2012/11/28 4
What is Virtualization
• VMM is a layer of abstraction support multiple guest OSes de-privilege each OS to run as Guest OS
• VMM is a layer of redirection redirect physical platform to virtual platform illusions of many
provide virtaul platfom to guest os
...
Virtual Machine Monitor (VMM)
VMnVM0
Guest OS
VM1
Platform HW
I/O DevicesProcessorsMemory
Apps
Guest OS
Apps
Guest OS
Apps
...
Virtual Machine Monitor (VMM)
VMnVM0
Guest OS
VM1
Platform HW
I/O DevicesProcessorsMemory
Apps
Guest OS
Apps
Guest OS
Apps
2012/11/28 5
Server Virtualization Usage Model
Server Consolidation
Benefit: Cost Savings • Consolidate services • Power saving
HW HW
HW
VMM
Disaster Recovery
HW
VMM
HW
VMM
… OS
App
OS
App
OS
App … OS
App
HW
VMM HW
VMM
• Benefit: Productivity
Dynamic Load Balancing
OS
App 1
OS
App 2
OS
App 3
OS
App 4
CPU Usage
30%
CPU Usage
90%
CPU Usage CPU Usage
Benefit: Business Agility and Productivity
R&D Production
HW
VMM
OS
App
Benefit: Lost saving • RAS • live migration • relief lost
2012/11/28 6
Agenda
• Introduction
•Virtualization overview CPU virtualization
Memory virtualization
I/O virtualization
•Xen/KVM architecture
•Some intel work for Openstack
2012/11/28 7
X86 virtualization challenges • Ring Deprivileging
Goal: isolate guest OS from • Controlling physical resources directly • Modifying VMM code and data
Ring deprivileging layout • vmm runs at full privileged ring0 • Guest kernel runs at
• X86-32: deprivileging ring 1 • X86-64: deprivileging ring 3
• Guest app runs at ring 3
Ring deprivileging problems • Unnecessary faulting
• some privilege instructions • some exceptions
• Guest kernel protection (x86-64)
• Virtualization holes 19 instructions
• SIDT/SGDT/SLDT … • PUSHF/POPF …
Some userspace holes hard to fix by s/w approach • Hard to trap, or • Performance overhead
2012/11/28 8
X86 virtualization challenges
Virtual Machine Monitor (VMM) Virtual Machine Monitor (VMM)
VM 0
Guest OS
Apps
VM 0
Guest Kernel
Guest Apps
VM 0
Guest OS
Apps
VM 1
Guest Kernel
Guest Apps
VM 0
Guest OS
Apps
VM 2
Guest Kernel
Guest Apps
Ring0
Ring1
Ring3
2012/11/28 9
Typical X86 virtualization approaches • Para-virtualization (PV)
Para virtualization approach, like Xen Modified guest OS aware and co-work with VMM Standardization milestone: linux3.0
• VMI vs. PVOPS • Bare metal vs. virtual platform
• Binary Translation (BT) Full virtualization approach, like VMWare Unmodified guest OS Translate binary ‘on-the-fly’
• translation block w/ caching, • usually used for kernel, ~80% native performance • userspace app directly runs natively as much as possible, ~100% native performance • overall ~95% native performance
• Complicated • Involves excessive complexities. e.g., self-modifying code
• Hardware-assisted Virtualization (VT) Full virtualization approach assisted by hardware, like KVM Unmodified guest OS Intel VT-x, AMD-v Benefits:
• Closing virtualization holes in hardware • Simplify VMM software • Optimizing for performance
2012/11/28 10
Memory virtualization challenges
• Guest OS has 2 assumptions expect to own physical memory starting from 0
• BIOS/Legacy OS are designed to boot from address low 1M
expect to own basically contiguous physical memory • OS kernel requires minimal contiguous low memory
• DMA require certain level of contiguous memory
• Efficient MM management, e.g., less buddy overhead
• Efficient TLB, e.g., super page TLB
• MMU virtualization How to keep physical TLB valid
Different approaches involve different complication and overhead
2012/5/13 11
Machine
Physical
Memory
Hypervisor
Guest
Pseudo
Physical
Memory
5
1
3
2
4
3
2
1
4
5
VM1 VM4 VM3 VM2
Memory virtualization challenges
2012/11/28 12
Memory virtualization approaches • Direct page table
Guest/VMM in same linear space Guest/VMM share same page table
• Shadow page table Guest page table unmodified
• gva -> gpa
VMM shadow page table • gva -> hpa
Complication and memory overhead
• Extended page table Guest page table unmodified
• gva -> gpa • full control CR3, page fault
VMM extended page table • gpa -> hpa • hardware based • good scalability for SMP • low memory overhead • Reduce page fault VMexit greatly
• Flexible choices Para virtualization
• Direct page table • Shadow page table
Full virtualization • Shadow page table • Extended page table
GVA
GPA
HPA
Extended page table
Shadow page table
Direct page table
Guest page table
13
Shadow page table
• Guest page table remains unmodified to guest Translate from gva -> gpa
• Hypervisor create a new page table for physical Use hpa in PDE/PTE
Translate from gva -> hpa
Invisible to guest
Page Directory
Page Table
PDE
PTE
Page Directory
Page Table
PDE
PTE
vCR3
pCR3
Virtual
Physical
2012/11/28
14
• Extended page table Guest can have full control over its page tables and events
• CR3, INVLPG, page fault
VMM controls Extended Page Tables • Complicated shadow page table is eliminated • Improved scalability for SMP guest
Guest Page Tables
Extended Page Tables
Guest Physical Address Host Physical
Address Guest Linear
Address
Guest CR3 EPT base pointer
Extended page table
2012/11/28
2012/11/28 15
I/O virtualization requirements
• I/O device from OS point of view Resource configuration and probe I/O request: IO, MMIO I/O data: DMA Interrupt
• I/O Virtualization require presenting guestos driver a complete device interface
• Presenting an existing interface • Software Emulation • Direct assignment
• Presenting a brand new interface • Paravirtualization
Device
CPU
Shared
Memory
Interrupt
Register Access
DMA
2012/11/28 16
I/O virtualization approaches
• Emulated I/O Software emulates real hardware device VMs run same driver for the emulated hardware device Good legacy software compatibility Emulation overheads limit performance
• Paravirtualized I/O Uses abstract interfaces and stack for I/O services FE driver: guest run virtualization-aware drivers BE driver: driver based on simplified I/O interface and stack Better performance over emulated I/O
• Direct I/O Directly assign device to Guest
• Guest access I/O device directly • High performance and low CPU utilization
DMA issue • Guest set guest physical address • DMA hardware only accept host physical address
Solution: DMA Remapping (a.k.a IOMMU) • I/O page table is introduced • DMA engine translate according to I/O page table
Some limitations under live migration
2012/11/28 17
Virtual platform models
ULM
Hypervisor Host
OS
Guest
OS
Guest
Apps
LKM
Guest
OS
Guest
Apps
ULM
U-Hypervisor
Service
VM Preferred
OS
Apps
P Processor Mgt code
M Memory Mgt code
DR Device Driver
DM Device Model
P
P
P M
M
M
DR
DR
DR
DM
DM
Hypervisor Model
DM
Host-based Model Hybrid Model
N NoDMA
N
Preferred
OS
Apps
Guest
OS
Guest
Apps
2012/11/28 18
Agenda
• Introduction
•Virtualization
•Xen/KVM architecture
•Some intel work for Openstack
2012/11/28 19
Xen Architecture
0P
1/3P
3P
I/O: PIT, APIC, PIC, IOAPIC Processor Memory
Control Interface Hypercalls Event Channel Scheduler
Inter-domain Event Channels
Xen Hypervisor
Fro
nt e
nd
Virtu
al
Driv
ers
XenLinux64
DomainU
Ba
ck
en
d
Virtu
al d
rive
r
Callback / Hypercall
Native
Device
Drivers
Co
ntro
l
Pa
ne
l
(xm
/xe
nd
)
XenLinux64
Domain 0
De
vic
e
Mo
de
ls
Virtual Platform
VM Exit
0D
HVM Domain
(64-bit)
3D
Guest BIOS
Unmodified
OS F
E
Driv
ers
Virtual Platform
VM Exit
Guest BIOS
Unmodified
OS
FE
Driv
ers
HVM Domain
(32-bit)
2012/11/28 20
KVM Architecture
VMCS VMCS VMCS
vCPU vMEM vTimer
vPIC vAPIC vIOAPIC
Windows
Guest
Linux
Guest
Qemu-kvm
Linux Kernel
Root
Non Root
KVM module
2012/11/28 21
Agenda
• Introduction
•Virtualization
•Xen/KVM architecture
•Some intel work for Openstack
Trusted Pools - Implementation
Attestation
Service
Scheduler
EC
2 A
PI
OS
AP
I
Query API
User specifies :: Mem > 2G Disk > 50G GPGPU=Intel trusted_host=trusted HW/TXT
Hypervisor / tboot
OS
App App
App
OS
App App
App Host
agent
Attestation Server
Privacy CA
Appraiser
Whitelist DB
Whitelist API
Ho
st A
ge
nt A
PI
Qu
ery
AP
I
OpenStack
TrustedFilter Create
Atte
st
Rep
ort
Qu
ery
tru
ste
d/
u
ntr
uste
d
Create VM
OAT-Based
Tboot-Enabled
Top Related