Fido: Fast Inter-Virtual-Machine Communication for ...Fido: Fast Inter-Virtual-Machine Communication...
Transcript of Fido: Fast Inter-Virtual-Machine Communication for ...Fido: Fast Inter-Virtual-Machine Communication...
Fido: Fast Inter-Virtual-Machine Communication for Enterprise
Appliances
Anton Burtsev†, Kiran Srinivasan, Prashanth Radhakrishnan, Lakshmi N. Bairavasundaram,
Kaladhar Voruganti, Garth R. Goodson
†University of Utah,
School of Computing
NetApp, Inc
Enterprise appliances
2
• High performance• Scalable and highly-available access
Network attached storage, routers, etc.
Example Appliance
3
• Monolithic kernel• Kernel components
Problems:
• Fault isolation• Performance isolation• Resource provisioning
Split architecture
4
Benefits of virtualization• High availability
• Fault-isolation• Micro-reboots• Partial functionality in case of failure
• Performance isolation
• Resource allocation• Consolidation and load balancing, VM migration
• Non-disruptive updates• Hardware upgrades via VM migration• Software updates as micro-reboots
• Computation to data migration
5
Main Problem: PerformanceIs it possible to match performance of a monolithic
environment?
6
• Large amount of data movement between components• Mostly cross-core
• Connection oriented (established once)
• Throughput optimized (asynchronous)
• Coarse grained (no one-word messages)
• Multi-stage data processing
• Main cost contributors• Transitions to hypervisor
• Memory map/copy operations
• Not VM context switches (multi-cores)
• Not IPC marshaling
Main Insight: Relaxed Trust Model
• Appliance is built by a single organization• Components:
• Pre-tested and qualified
• Collaborative and non-malicious
• Share memory read-only across VMs!
• Fast inter-VM communication• Exchange only pointers to data
• No hypervisor calls (only cross-core notification)• No memory map/copy operations
• Zero-copy across entire appliance
7
Contributions
• Fast inter-VM communication mechanism
• Abstraction of a single address space for traditional systems
• Case study• Realistic microkernelized network attached storage
8
Design
9
Design Goals
• Performance• High-throughput
• Practicality• Minimal guest system and hypervisor dependencies
• No intrusive guest kernel changes
• Generality• Support for different communication mechanisms in the
guest system
10
Transitive Zero Copy
11
• Goal• Zero-copy across entire appliance• No changes to guest kernel
• Observation• Multi-stage data processing
Pseudo Global Virtual Address Space
12
264
0
Insight: • CPUs support 64-bit address space• Individual VMs have no need in it
Pseudo Global Virtual Address Space
13
264
0
Pseudo Global Virtual Address Space
14
264
0
Transitive Zero Copy
15
Fido: High-level View
16
Fido: High-level View
17
• “c” – connection management• “m” – memory mapping• “s” – cross-VM signaling
IPC Organization
18
• Shared memory ring• Pointers to data
IPC Organization
19
• Shared memory ring• Pointers to data
• For complex data structures• Scatter-gather array
IPC Organization
20
• Shared memory ring• Pointers to data
• For complex data structures• Scatter-gather array
• Translate pointers
IPC Organization
21
• Shared memory ring• Pointers to data
• For complex data structures• Scatter-gather array
• Translate pointers
•Signaling:• Cross-core interrupts (event channels)• Batching and in-ring polling
Fast device-level communication
• MMNet• Link-level
• Standard network device interface
• Supports full transitive zero-copy
• MMBlk• Block-level
• Standard block device interface
• Zero-copy on write
• Incurs one copy on read
22
Evaluation
23
MMNet Evaluation
24
• AMD Opteron with 2 2.1GHz 4-core CPUs (8 cores total)
• 16GB RAM• NVidia 1Gbps NICs• 64-bit Xen (3.2), 64-bit Linux (2.6.18.8)• Netperf benchmark (2.4.4)
Loop NetFront MMNetXenLoop
MMNet: TCP Throughput
0
2000
4000
6000
8000
10000
12000
0.5 1 2 4 8 16 32 64 128 256
Thro
ugh
pu
t (M
bp
s)
Message Size (KB)
Monolithic
Netfront
XenLoop
MMNet
25
MMBlk Evaluation
26
• Same hardware• AMD Opteron with 2 2.1GHz 4-core CPUs (8 cores total)• 16GB Ram• NVidia 1Gbps NICs
• VMs are configured with 4GB and 1GB RAM• 3 GB in-memory file system (TMPFS)• IOZone benchmark
MMNetXenBlkMonolithic
MMBlk Sequential Writes
27
0
100
200
300
400
500
600
4 8 16 32 64 128 256 512 1K 2K 4K
Thro
ugh
pu
t (M
B/s
)
Record Size (KB)
Monolithic
XenBlk
MMBlk
Case Study
28
Network-attached Storage
29
Network-attached Storage
30
• RAM• VMs have 1GB each, except FS VM (4GB)• Monolithic system has 7GB RAM
• Disks : • RAID5 over 3 64MB/s disks
• Benchmark• IOZone reads/writes 8GB file over NFS (async)
Sequential Writes
31
0
10
20
30
40
50
60
70
80
90
4 8 16 32 64 128 256 512 1K 2K 4K
Thro
ugh
pu
t (M
B/s
)
Record Size (KB)
Monolithic
Native-Xen
MM-Xen
Sequential Reads
32
0
10
20
30
40
50
60
70
80
4 8 16 32 64 128 256 512 1K 2K 4K
Thro
ugh
pu
t (M
B/s
)
Record Size (KB)
Monolithic
Native-Xen
MM-Xen
TPC-C (On-Line Transactional Processing)
0
50
100
150
200
250
300
350
Tran
sact
ion
s/m
inu
te (
tpm
C)
Monolithic
MMXen
Native-Xen
33
Conclusions• We match monolithic performance
• “Microkernelization” of traditional systems is possible!
• Fast inter-VM communication• The search for VM communication mechanisms is not over
• Important aspects of design• Trust model
• VM as a library (for example, FSVA)
• End-to-end zero copy• Pseudo Global Virtual Address Space
• There are still problems to solve• Full end-to-end zero copy• Cross-VM memory management• Full utilization of pipelined parallelism
34
Backup Slides
36
Related Work
• Traditional microkernels [L4, Eros, CoyotOS]• Synchronous (effectively thread migration)• Optimized for single-CPU, fast context switch, small
messages (often in registers), efficient marshaling (IDL)• Buffer management [Fbufs, IOLite, Beltway Buffers]
• Shared buffer is a unit of protection• Fast-forward – fast cache-to-cache data transfer
• VMs [Xen split drivers, XWay, XenSocket, XenLoop]• Page flipping, later buffer sharing• IVC, VMCI
• Language-based protection [Singularity]• Shared heap, zero-copy (only pointer transfer)
• Hardware acceleration [Solarflare]• Multi-core OSes [Barrelfish, Corey, FOS]
37