Presenter: Hung-Fu Li HPDS Lab. NKUAS 2009-12-31 1 vCUDA: GPU Accelerated High Performance Computing...

Presenter: Hung-Fu Li

HPDS Lab.NKUAS

vCUDA: GPU Accelerated High Performance Computing in Virtual Machines

Lin Shi, Hao Chen and Jianhua Sun

IEEE 2009

2

Lecture Outline

Abstract 3Background 4Motivation 5CUDA Architecture 7vCUDA Architecture 8Experiment Result 13Conclusion 19

3

Abstract

This paper describe vCUDA, a GPGPU computation solution for virtual machine. The author announced that the API interception and redirection could provide transparent and high performance to the applications.This paper would carry out the performance evaluation on the overhead of their framework.

4

Background

VM(Virtual Machine)CUDA (Computation Unified Device Architecture)API (Application Programming Interface)API Interception, RedirectionRPC(Remote Procedure Call)

5

Motivation

Virtualization may be the simplest solution to heterogeneous computation environment.Hardware varied by vendors, it is not necessary for VM-developer to implements hardware drivers for them. (due to license, vendor would not public the source and kernel technique)

6

Motivation ( cont. )

Currently the virtualization does only support Accelerated Graphic API such as OpenGL, named VMGL, which is not used for general computation purpose.

7

CUDA Architecture

Component Stack

CUDA Enabled Device

CUDA Driver API

CUDA Runtime API

CUDA Driver

User Application

<< CUDA Extensions to C>>

8

vCUDA Architecture

Split the stack into hardware/software binding

CUDA Enabled Device

CUDA Driver API

CUDA Runtime API

CUDA Driver

User Application


hard binding

soft binding

Direct communicate

Part of SDK

9

vCUDA Architecture ( cont. )

Re-group the stack into host and remote side.

CUDA Enabled Device

[v]CUDA Driver API

[v]CUDA Runtime API

CUDA Driver

User Application


CUDA Driver API

Host binding

Remote binding(guestOS)

Part of SDK

[v]CUDA Enabled Device(vGPU)

10


Use fake API as adapter to adapt the instant driver and the virtual driver.API Interception

Parameters passed

Order Semantics

Hardware State

Communication

Use Lazy-RPC TransmissionUse XML-RPC as high-level communication.(for cross-platform requirement)

[v]CUDA Driver API

[v]CUDA Runtime APIRemote binding(guestOS)[v]CUDA Enabled Device(vGPU)

11


Virtual Machine OSHost OS

lazyRPC

Non instant API

Instant API

12


vCUDA API with virtual GPULazy RPC

Reduce the overhead of switching between host OS and guest OS.

AP LazyRPC

vGPUHardware states

API Invocation

GPU

Instant api call

NonInstant API call

NonInstant Package

Stub

vStub

13

Experiment Result

CriteriaPerformance

Lazy RPC and Concurrency

Suspend& Resume

Compatibility

14

Experiment Result ( cont. )Experiment Result ( cont. )

CriteriaPerformance


Suspend& Resume

Compatibility

15


CriteriaPerformance


Suspend& Resume

Compatibility

16


CriteriaPerformance


Suspend& Resume

Compatibility

17


CriteriaPerformance


Suspend& Resume

Compatibility

18


CriteriaPerformance


Suspend& Resume

Compatibility

MV: Matrix Vector Multiplication AlgorithmStoreGPU: Exploiting Graphics Processing Units to Accelerate Distributed Storage Systems MRRR: Multiple Relatively Robust RepresentationsGPUmg: Molecular Dynamics Simulation with GPU

19

Conclusion

They have developed CUDA interface for virtual machine, which is compatible to the native interface. The data transmission is a significant bottleneck, due to RPC XML-parsing. This presentation have briefly present the major architecture of the vCUDA and the idea of it. We could extend the architecture as component / solution to make the cloud computing support GPU.

20

End of Presentation

Thanks for your listening.

Presenter: Hung-Fu Li HPDS Lab. NKUAS 2009-12-31 1 vCUDA: GPU Accelerated High Performance Computing...

Documents

Transcript of Presenter: Hung-Fu Li HPDS Lab. NKUAS 2009-12-31 1 vCUDA: GPU Accelerated High Performance Computing...