Evaluating GPU Passthrough in Xen for High Performance Cloud
Computing Andrew J. Younge1, John Paul Walters2, Stephen P. Crago2, and Geoffrey C. Fox1
1 Indiana University
2 USC / Information Sciences Institute
Where are we in the Cloud?
• Cloud computing spans may areas of expertise
• Today, focus only on IaaS and the underlying hardware
• Things we do here effect the entire pyramid!
http://futuregrid.org 2
Motivation
• Need for GPUs on Clouds– GPUs are becoming commonplace in scientific
computing– Great performance-per-watt
• Different competing methods for virtualizing GPUs– Remote API for CUDA calls– Direct GPU usage within VM
• Advantages and disadvantages to both solutions
3
Front-end GPU API • Translate all CUDA calls into remote method
invocations• Users share GPUs across a node or cluster• Can run within a VM, as no hardware is needed,
only a remote API• Many implementations for CUDA
– rCUDA, gVirtus, vCUDA, GViM, etc..
• Many desktop virtualization technologies do the same for OpenGL & DirectX
http://futuregrid.org 4
Front-end GPU API
http://futuregrid.org 5
Front-end API Limitations
• Can use remote GPUs, but all data goes over the network– Can be very inefficient for applications with non-
trivial memory movement
• Usually doesn’t support CUDA extensions in C– Have to separate CPU and GPU code– Requires special decouple mechanism
• Cannot directly drop in solution with existing solutions.
http://futuregrid.org 6
Direct GPU Passthrough
• Allow VMs to directly access GPU hardware• Enables CUDA and OpenCL code• Utilizes PCI-passthrough of device to guest VM
– Uses hardware directed I/O virt (VT-d or IOMMU)– Provides direct isolation and security of device– Removes host overhead entirely
• Similar to what Amazon EC2 uses
http://futuregrid.org 7
Direct GPU Passthrough
http://futuregrid.org 8
9
Hardware SetupSandy Bridge + Kepler Westmere + Fermi
CPU (cores) 2x E5-2670 (16) 2x X5660 (12)
Clock Speed 2.6 GHz 2.6 GHz
RAM 48 GB 192 GB
NUMA Nodes 2 2
GPU 1x Nvidia Tesla K20m 2x Nvidia Tesla C2075
Type Linux Kernel Linux DistroNative Host 2.6.32-279 CentOS 6.4
Xen Dom0 4.2.22 3.4.53-8 CentOS 6.4
DomU Guest VM 2.6.32-279 CentOS 6.4
SHOC Benchmark Suite
• Developed by Future Technologies Group @ Oak Ridge National Laboratory• Provides 70 benchmarks
– Synthetic micro benchmarks– 3rd party applications– OpenCL and CUDA implementations
• Represents well-rounded view for GPU performance
http://futuregrid.org 10
http://futuregrid.org 11
http://futuregrid.org 12
http://futuregrid.org 13
http://futuregrid.org 14
Initial Thoughts
• Raw GPU computational abilities impacted less than 1% in VMs compared to base system– Excellent sign for supporting GPUs in the Cloud
• However, overhead occurs during large transfers between CPU & GPU– Much higher overhead for Westmere/Fermi test
architecture– Around 15% overhead in worst-case benchmark– Sandy-bridge/Kepler overhead lower
http://futuregrid.org 15
http://futuregrid.org 16
http://futuregrid.org 17
Discussion• GPU Passthrough possible in Xen!
– Results show high performance GPU computation a reality with Xen
• Overhead is minimal for GPU computation – Sandy-Bridge/Kepler has < 1.2% overall overhead– Westmere/Fermi has < 1% computational overhead, 7-25%
PCIE overhead
• PCIE overhead not likely due to VT-d mechanisms– NUMA configuration in Westmere CPU architecture
• GPU PCI Passthrough performs better than other front-end remote API solutions
http://futuregrid.org 18
Future Work
• Support PCI Passthrough in Cloud IaaS Framework – OpenStack Nova– Work for both GPUs and other PCI devices– Show performance better than EC2
• Resolve NUMA issues with Westmere architecture and Fermi GPUs
• Evaluate other hypervisor GPU possibilities • Support large scale distributed CPU+GPU
computation in the Cloudhttp://futuregrid.org 19
Conclusion
• GPUs are here to stay in scientific computing– Many Petascale systems use GPUs– Expected GPU Exascale machine (2020-ish)
• Providing HPC in the Cloud is key to the viability of scientific cloud computing.
• OpenStack provides an ideal architecture to enable HPC in clouds.
http://futuregrid.org 20
Thanks!
Acknowledgements:• NSF FutureGrid project
– GPU cluster hardware– FutureGrid team @ IU
• USC/ISI APEX research group
• Persistent Systems Graduate Fellowship
• Xen open source community
About Me:Andrew J. Younge
Ph.D CandidateIndiana University Bloomington, IN USAEmail – [email protected] – http://ajyounge.com
http://portal.futuregrid.org
http://futuregrid.org 21
Extra Slides
http://futuregrid.org 22
FutureGrid: a Distributed Testbed
PrivatePublic FG Network
NID: Network Impairment Device
http://futuregrid.org 24
OpenStack GPU Cloud Prototype
http://futuregrid.org 25
~ 1.25%
26
~.64%
~3.62%
27
Overhead in Bandwidth
28
Top Related