OpenCL Sathish Vadhiyar Sources: OpenCL overview from AMD OpenCL learning kit from AMD.

14
OpenCL Sathish Vadhiyar Sources: OpenCL overview from AMD OpenCL learning kit from AMD

Transcript of OpenCL Sathish Vadhiyar Sources: OpenCL overview from AMD OpenCL learning kit from AMD.

Page 1: OpenCL Sathish Vadhiyar Sources: OpenCL overview from AMD OpenCL learning kit from AMD.

OpenCL

Sathish VadhiyarSources: OpenCL overview from AMD OpenCL learning kit from AMD

Page 2: OpenCL Sathish Vadhiyar Sources: OpenCL overview from AMD OpenCL learning kit from AMD.

Introduction

OpenCL is a programming framework for heterogeneous computing resources

Resources include CPUs, GPUs, Cell Broadband Engine, FPGAs, DSPs

Many similarities with CUDA

Page 3: OpenCL Sathish Vadhiyar Sources: OpenCL overview from AMD OpenCL learning kit from AMD.
Page 4: OpenCL Sathish Vadhiyar Sources: OpenCL overview from AMD OpenCL learning kit from AMD.

Command QueuesA command queue is the mechanism for the

host to request that an action be performed by the device Perform a memory transfer, begin executing, etc. Interesting concept of enqueuing kernels and

satisfying dependencies using events

A separate command queue is required for each device

Commands within the queue can be synchronous or asynchronous

Commands can execute in-order or out-of-order

4Perhaad Mistry & Dana Schaa, Northeastern Univ Computer

Architecture Research Lab, with Ben Gaster, AMD © 2011

Page 5: OpenCL Sathish Vadhiyar Sources: OpenCL overview from AMD OpenCL learning kit from AMD.

Example – Image Rotation

Page 6: OpenCL Sathish Vadhiyar Sources: OpenCL overview from AMD OpenCL learning kit from AMD.

Slides 8, 11-16 of lecture 5 in openCL University kit

Page 7: OpenCL Sathish Vadhiyar Sources: OpenCL overview from AMD OpenCL learning kit from AMD.

Synchronization

Page 8: OpenCL Sathish Vadhiyar Sources: OpenCL overview from AMD OpenCL learning kit from AMD.

Synchronization in OpenCL

Synchronization is required if we use an out-of-order command queue or multiple command queues

Coarse synchronization granularity Per command queue basis

Finer synchronization granularity Per OpenCL operation basis using events

8Perhaad Mistry & Dana Schaa, Northeastern Univ Computer

Architecture Research Lab, with Ben Gaster, AMD © 2011

Page 9: OpenCL Sathish Vadhiyar Sources: OpenCL overview from AMD OpenCL learning kit from AMD.

OpenCL Command Queue Control Command queue synchronization methods work on a per-queue

basis Flush: clFlush(cl_commandqueue)

Send all commands in the queue to the compute device

No guarantee that they will be complete when clFlush returns

Finish: clFinish(cl_commandqueue) Waits for all commands in the command queue to

complete before proceeding (host blocks on this call) Barrier: clEnqueueBarrier(cl_commandqueue)

Enqueue a synchronization point that ensures all prior commands in a queue have completed before any further commands execute

9Perhaad Mistry & Dana Schaa, Northeastern Univ Computer

Architecture Research Lab, with Ben Gaster, AMD © 2011

Page 10: OpenCL Sathish Vadhiyar Sources: OpenCL overview from AMD OpenCL learning kit from AMD.

OpenCL Events

Previous OpenCL synchronization functions only operated on a per-command-queue granularity

OpenCL events are needed to synchronize at a function granularity

Explicit synchronization is required for Out-of-order command queues Multiple command queues

10Perhaad Mistry & Dana Schaa, Northeastern Univ Computer

Architecture Research Lab, with Ben Gaster, AMD © 2011

Page 11: OpenCL Sathish Vadhiyar Sources: OpenCL overview from AMD OpenCL learning kit from AMD.

Using User Events

A simple example of user events being triggered and used in a command queue

//Create user event which will start the write of buf1user_event = clCreateUserEvent(ctx, NULL);clEnqueueWriteBuffer( cq, buf1, CL_FALSE, ..., 1, &user_event , NULL);//The write of buf1 is now enqued and waiting on user_event

X = foo(); //Lots of complicated host processing code

clSetUserEventStatus(user_event, CL_COMPLETE);//The clEnqueueWriteBuffer to buf1 can now proceed as per OP of foo()

11Perhaad Mistry & Dana Schaa, Northeastern Univ Computer

Architecture Research Lab, with Ben Gaster, AMD © 2011

Page 12: OpenCL Sathish Vadhiyar Sources: OpenCL overview from AMD OpenCL learning kit from AMD.

Multiple Devices

Page 13: OpenCL Sathish Vadhiyar Sources: OpenCL overview from AMD OpenCL learning kit from AMD.

Multiple Devices OpenCL can also be used to program multiple

devices (CPU, GPU, Cell, DSP etc.) OpenCL does not assume that data can be

transferred directly between devices, so commands only exists to move from a host to device, or device to host Copying from one device to another requires an

intermediate transfer to the host

OpenCL events are used to synchronize execution on different devices within a context

Page 14: OpenCL Sathish Vadhiyar Sources: OpenCL overview from AMD OpenCL learning kit from AMD.

Compiling Code for Multiple Devices