CUDA.NET_2.0

8/12/2019 CUDA.NET_2.0

1/13

Page | 1 All rights reserved 2008. Company for Advanced Supercomputing Solutions

CUDA.NETManual

Reference for programmers

Written by: Mordechai Butrashvily

Date: 17/08/2008

E-mail:[email protected]

Website:http://www.gass-ltd.co.il/products/cuda.net

Revision Writers Date Changes

1.1 Mordechai Butrashvily 17/08/2008 2nd

revision, final version for

CUDA.NET 1.1

1.0 Mordechai Butrashvily 10/08/2008 First revision
mailto:[email protected]:[email protected]:[email protected]://www.gass-ltd.co.il/products/cuda.nethttp://www.gass-ltd.co.il/products/cuda.nethttp://www.gass-ltd.co.il/products/cuda.nethttp://www.gass-ltd.co.il/products/cuda.netmailto:[email protected]

8/12/2019 CUDA.NET_2.0

2/13


NoticeALL COMPANY'S DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS, DIAGNOSTICS,

LISTS, AND OTHER DOCUMENTS (TOGETHER AND SEPARATELY, MATERIALS) ARE BEING PROVIDED

AS IS. THE COMPANY MAKES NO WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE

WITH RESPECT TO THE MATERIALS, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OFNONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE.

Information furnished is believed to be accurate and reliable. However, the company assumes no

responsibility for the consequences of use of such information or for any infringement of patents or

other rights of third parties that may result from its use. No license is granted by implication or

otherwise under any patent or patent rights of the company. Specifications mentioned in this

publication are subject to change without notice. This publication supersedes and replaces all

information previously supplied. Company's products are not authorized for use as critical

components in life support devices or systems without express written approval of the company.

Trademarks

NVIDIA is a trademark or registered trademarks of NVIDIA Corporation. Other company and productnames may be trademarks of the respective companies with which they are associated.

Copyright 2008 - Company for Advanced Supercomputing Solutions Ltd, All rights reserved.

Bosmat 2a Street

Shoham, 73142,

Israel

http://www.gass-ltd.co.il
http://www.gass-ltd.co.il/http://www.gass-ltd.co.il/http://www.gass-ltd.co.il/

8/12/2019 CUDA.NET_2.0

3/13


Contents

Introduction.................................................................................................................

4

CUDA.NET basic objects .................................................................................................5

Driver Objects ...........................................................................................................5

Data Types ...............................................................................................................5

Working with devices ................................................................................................6

Working with device memory .....................................................................................6

Launching CUDA code ...................................................................................................7

Working with modules...............................................................................................

7

Working with functions ..............................................................................................8

Setting function parameters .......................................................................................8

Setting execution configuration ..................................................................................9

Working with CUFFTDriver .............................................................................................9

Higher Level Objects ...................................................................................................11

New Object Model ......................................................................................................11

CUDA Object...........................................................................................................12

Working with CUFFT ................................................................................................13

Working with CUBLAS ..............................................................................................13

8/12/2019 CUDA.NET_2.0

4/13


Introduction

CUDA.NET is a library that provides the same functionality by CUDA driver (exposed throughC interface) for .NET based applications.

The version of CUDA.NET this document relates to is CUDA.NET 1.1. That implies, by the

version of CUDA.NET, that the API of CUDA 1.1 is supported.

The library has been tested and can run without a problem on CUDA 2.0, but new features

that are available as of CUDA 2.0 are not yet supported by CUDA.NET.

As such, it wraps all the functionality of CUDA for .NET, practically speaking:

Device enumeration Context management Memory allocation and transfer (including arrays management) Texture management Asynchronous data transfer and execution - through streams

In addition, it provides access to all other routines provided by CUDA:

FFT 1D3D BLAS routines

To simplify development of .NET based applications, the library includes data types that

correspond to CUDA specifications, especially vector types:

CUDA.NET CUDA

Char1, Char2, Char3, Char4 char1, char2, char3, char4UChar1, UChar2, UChar3, UChar4 uchar1, uchar2 ,uchar3, uchar4

Short1, Short2, Short3, Short4 short1, short2, short3, short4

UShort1, UShort2, UShort3, UShort4 ushort1, ushort2, ushort3, ushort4

Int1, Int2, Int3, Int4 int1, int2, int3, int4

UInt1, UInt2, UInt3, UInt4 uint1, uint2, uint3, uint4

Long1, Long2, Long3, Long4 long1, long2, long3, long4

ULong1, ULong2, ULong3, ULong4 ulong1, ulong2, ulong3, ulong4

Float1, Float2, Float3, Float4 float1, float2, float3, float4

That is, while supporting the basic primitive types (CUDA.NET syntax conforms to C#):

CUDA.NET CUDA

sbyte, byte char, unsigned char

short, ushort short, unsigned short

int, uint int, unsigned int

long, ulong long, unsigned long

float float

8/12/2019 CUDA.NET_2.0

5/13


CUDA.NET basic objects

As stated in the previous section, CUDA.NET is a wrapper over CUDA driver.To ease development and migration from existing CUDA application written in C to .NET, the

same API was reserved.

Accessing the driver API of CUDA from .NET can be done by using CUDADriverobject of

CUDA.NET.

All methods are static, thus allowing direct access to the same functions.

For example, let's consider the following CUDA application written in C:

#include

intmain()

{

// Initialize the driver.cuInit(0);

}

The same code with CUDA.NET looks like this:

usingGASS.CUDA;

namespaceCUDATest

{

classTest

{

staticvoidMain(string[] args){

// Initialize the driver.

CUDADriver.cuInit(0);

}

}

}

The same approach can be applied to all other functions of the CUDA driver API.

Driver ObjectsThe set of basic wrapper objects provided by CUDA.NET are:

CUDADriverprovides access to CUDA API CUFFTDriverprovides access to CUFFT API CUBLASDriverprovides access to CUBLAS API and routines

Data Types

Looking into GASS.CUDA.Types namespace reveals some types that were created to support

all features of CUDA from a .NET application:

CUdeviceRepresents a pointer to a device object CUdeviceptrRepresents a pointer to device memory

8/12/2019 CUDA.NET_2.0

6/13


CUcontextRepresents a pointer to a context object CUmoduleRepresents a pointer to a loaded module object CUfunction - Represents a pointer to a function in a module CUarray - Represents a pointer to an allocated array in device memory CUtexref - Represents a pointer to a texture in device memory CUevent - Represents a pointer to an event CUstream - Represents a pointer to a stream that can be used for asynchronous

operations

All these objects conform to the declarations in CUDA.

Working with devices

Before starting to perform CUDA operations, we must initialize the driver and select a device

to work with. Selecting a device happens of behalf of creating a context.

An example for that might be:

staticvoidMain(string[] args)

{

// Initialize the driverthis call must be the first before any CUDA operation!

CUDADriver.cuInit(0);

// Get the first device from the driver.

CUdevice dev = newCUdevice();

CUDADriver.cuDeviceGet(refdev, 0);

// Create a new context with default flags.

CUcontext ctx = newCUcontext();

CUDADriver.cuCtxCreate(refctx, 0, dev);

}

By creating a context we tell the driver that this is the one to be used throughout all CUDA

operations (we can use attach and detach functions later to manage the context we work

with).

It should be noted that a context is always related to a single device.

Working with device memory

Using pointers from .NET code (with unsafe semantics) is discouraged, that is why all

functions that accept pointers to device memory receive an object of type "CUdeviceptr"

instead.

This way we keep .NET code clean, and maintain compatibility with the C API of CUDA, since

all this objects are declared in the C environment as well.

An example for allocating device memory from .NET:

8/12/2019 CUDA.NET_2.0

7/13



{

// Assuming the driver was initialized and a context was created.

CUdeviceptr p1 = newCUdeviceptr();

// Allocate 1K of data in device memory.

CUDADriver.cuMemAlloc(refp1, 1

8/12/2019 CUDA.NET_2.0

8/13


CUmodule mod = newCUmodule();

CUDADriver.cuModuleLoad(refmod, "compute.cubin");

* It is highly encouraged to use full path to denote a module file name.

After executing the code above we end up with a module that is loaded by the driver.The next step will be to get a function to execute from that module.

Working with functions

In the previous section we said that the CUDA driver can load modules in run-time, the same

holds for functions too, although functions are hosted by modules.

Once we have a loaded module, and a reference to its object, we can get a reference to one

of its global functions in the following way, using CUDA.NET:

CUfunction func = newCUfunction();

CUDADriver.cuModuleGetFunction(reffunc, mod, "compute");

We used the module we loaded previously to get a function name compute.

At this point you can understand why the declaration of the function in the compute.cu file

involved the use of extern "C"keyword. The reason for that is because nvcc is a C++

compiler, so it emit symbol with name mangling of C++. But, to simplify our process when

we wish to load a function, we want to specify its direct name.

At this point we have a function in hand that is almost ready for execution.

The next step will be to set the function's parametersdynamically.

Setting function parameters

So after we have a function, before it is being executed in the GPU we need to specify some

parameters and configuration information.

Investigating the function signature we used ("compute"), we find that it accepts one

parameter, which is a pointer to device memory:

extern"C" __global__ voidcompute(float4* data);

Before we set parameter information, it is necessary to allocate the memory in the device:

Float4[] data = newFloat4[100];

CUdeviceptr ptr = newCUdeviceptr();

CUDADriver.cuMemAlloc(refptr, (uint)Marshal.SizeOf(data));

// Copy the data to the device

Setting parameters using CUDA.NET for this function looks like this:

CUDADriver.cuParameterSeti(func, 0, (uint)ptr.Pointer);

8/12/2019 CUDA.NET_2.0

9/13


But that is not enough. We still need to tell the driver a hint that indicates how much

memory to reserve for function parameters:

CUDADriver.cuParameterSetSize(func, 4);

* NOTE: When working under 32 bit systems and compiling CUDA code for 32 bit function

pointer will always be in the length of 4 bytes. Under 64 bit systems, specifically when

compiling CUDA code to 64 bit, function pointers have a length of 8 bytes, so the last

parameter of cuParameterSetSize varies with the platformit is possible to get the pointer

size in run-time using the size of IntPtr object in .NET.

Setting execution configuration

One last step before executing the code in the GPU we need to set execution configuration

for our context (meaning the function to be executed).

As already known, with CUDA execution is divided into grids that in turn are divided intoblocks, which are divided to threads (the basic execution element).

It is not the goal of this document to describe this approach as it is widely covered in the

documentation provided by NVIDIA for CUDA.

The driver API provides functions to set each of these parameters:

Grid size by means of blocks Block size by means of threads

To set threads count for every block of execution:

CUDADriver.cuFunctionSetBlockShape(func, 64, 8, 0);

This way, we set the block size to be 64 threads in the X axis, 8 threads in the Y axis and 0 in

the Z, for a total of 512 threads in each block.

It is possible to set only one of the axes.

To launch the function in a grid:

CUDADriver.cuLaunchGrid(func, 512, 512);

The code above really executes the function in the GPU with a configuration of 512 blocks in

the X and Y axes respectively, for a total amount of 262,144 blocks and 134,217,728 threads

to be executed.

Working with CUFFTDriver

CUFFT routines provided by CUDA allow a programmer to perform FFT calculation in the

GPU.

The same API exposed by including cufft.h is used in CUDA.NET.

For example, let's consider the following code given in the official documentation of CUDA

(written by NVIDIA):

1D Complex-to-Complex Transform

8/12/2019 CUDA.NET_2.0

10/13

Page |

10

All rights reserved 2008. Company for Advanced Supercomputing Solutions

#defineNX 256

#defineBATCH 10

cufftHandle plan;

cufftComplex *data;cudaMalloc((void**)&data, sizeof(cufftComplex) * NX * BATCH);

/* Create a 1D FFT plan. */

cufftPlan1D(&plan, NX, CUFFT_C2C, BATCH);

/* Use the CUFFT plan to transform the signal in place. */

cufftExecC2C(plan, data, data, CUFFT_FORWARD);

/* Inverse transform the signal in place. */

cufftExecC2C(plan, data, data, CUFFT_INVERSE);

/* Destroy the CUFFT plan. */

cufftDestroy(plan);

cudaFree(data);

Performing the same operations with CUDA.NET, looks like this:

usingGASS.CUDA;

usingGASS.CUDA.FFT;

usingGASS.CUDA.FFT.Types;

usingSystem.Runtime.InteropServices;

namespaceCUFFTTest{

classTest

{

constintNX = 256;

constintBATCH = 10;


{

// Assume driver is initialized and a context was created.

// Allocate data for the array.CUdeviceptr data = newCUdeviceptr();

CUDADriver.cuMemAlloc(refdata,

Marshal.SizeOf(typeof(cufftComplex)) * NX * BATCH);

/* Create a 1D plan. */

cufftHandle plan = newcufftHandle();

CUFFTDriver.cufftPlan1D(refplan, NX,

CUFFTType.ComplexToComplex, BATCH);

/* Perform a forward FFT. */

CUFFTDriver.cufftExecC2C(plan, data, data,CUFFTDirection.Forward);

8/12/2019 CUDA.NET_2.0

11/13

Page |

11


/* Perform an inverse FFT. */

CUFFTDriver.cufftExecC2C(plan, data, data,

CUFFTDirection.Inverse);

/* Clean resources and free memory. */

CUFFTDriver.cufftDestroy(plan);

CUDADriver.cuMemFree(data);

}

}

}

The approach can be used with other types of FFT.

Higher Level ObjectsAs with the final release of CUDA.NET, three object were added to simplify development

with CUDA.NET:

CUDAto provide all CUDA functionality CUFFTprovides CUFFT functionality with simplified functions CUBLASsimplifies working with CUBLAS routines

All new objects use the respective driver, so backward compatibility is maintained with

previous versions.

To provide better feedback of what happened in the driver, all objects will through a

runtime exception that is specific to the class itself:

CUDACUDAException CUFFTCUFFTException CUBLASCUBLASException

When an error occurs and the return value from the relevant driver function is different

from CUResult.Success.

This behavior can be controlled through the UseRuntimeExceptions property, which is by

default true.

To turnoff runtime exceptions, simply set the value of this property to false, and it can beturned on again later.

New Object ModelThe major change was in CUDA to allow programmers work easily with CUDA and devices.

A new object oriented approach was suggested for this purpose. For example, it is possible

to enumerate devices that are recognized by CUDA simply by accessing the following

property of CUDA:

CUDA cuda = newCUDA(true);

8/12/2019 CUDA.NET_2.0

12/13

Page |

12


foreach(Device dev incuda.Devices)

{

Console.WriteLine("{0} -> {1}", dev.Ordinal, dev.Name);

}

The rationale behind the object model was to provide the same API with better syntax and

function names, and to add some useful functions that will improve programming agility.

CUDA Object

This object was created in mind to provide simpler access to CUDA functions, without using

refkeywords or using too low-level API.

Most of the functionality that is supported by CUDADriver is available through this object

although some functions didn't find their way into. In future releases they will be added ifthere will be necessity for that.

Let's consider the case with memory allocation -

We can simply allocate memory using ordinary usage through:

CUdeviceptr ptr = cuda.Allocate(128);

This fragment of code simply allocates 128 bytes of device memory and returns the

appropriate pointer.

It should be noted at this point, that all functions can still operate with low-level driver

objects, to allow interoperability with CUDADriver object.

Allocating memory for a .NET array can be done like this:

UInt3[] data = newUInt3[128];

CUdeviceptr ptr = cuda.Allocate(data);

The fragment above allocates enough memory for 128 elements of UInt3 vector type, for a

total of 1536 bytes.

Using generic code and some explicit reflection code the amount of memory to allocate is

computed by the functions so that there is no necessary to provide such detailsonly the

array to allocate memory for.

To ease programming, some further functions were provided to allow allocating memory

and copying data to device memory in a single call:

UInt3[] data = newUInt3[256];

CUdeviceptr ptr = cuda.CopyHostToDevice(data);

This code fragment allocates device memory for 256 elements of UInt3 (total of 3072 bytes)

and copies the array to device memory. Of course that this mechanism can be used with

other types of arrays and CUDA.NET supported primitives.

8/12/2019 CUDA.NET_2.0

13/13

Page |

13


Working with CUFFT

As with CUFFT object the API now supports the older function with nicer usage, but allows

performing most of FFT operations in a single call.

Creating a 1D plan can be done by:

CUFFT cufft = newCUFFT(new CUDA(true));

cufftHandle plan = cufft.Plan1D(nx, type, batch);

But it is possible to run any of the 1D FFT routines through calling:

cufftReal[] realData = newcufftReal[256];

cufftComplex[] cmlxData = newcufftComplex[256];

cufft.Execute1D(realData, cmlxData, nx, batch);

The function handles memory management by itself and executes the appropriate FFT basedon the provided parameters.

The same holds for all other types of FFT.

Working with CUBLAS

CUBLAS object was created in mind to provide better usage for working with vector and

matrix memory, while all other operations are still accessible from CUBLASDriver object.

It is possible that in future versions all supported functions will enter CUBLAS as well with

simpler signature.

An example for initializing a vector:

CUDA cuda = newCUDA(true);

CUBLAS blas = newCUBLAS(cuda);

blas.Init();

float[] data = newfloat[] { 0.0f, 1.5f, 2.5f, 5.224f };

CUdeviceptr vector = blas.Allocate(data);

blas.SetVector(data, vector);

blas.Free(vector);

blas.Shutdown();

The example above demonstrates how to create a vector in device memory and copy data to

be used by one of CUBLAS routines.

CUDA.NET_2.0

Documents

Transcript of CUDA.NET_2.0