1 - Introduction to OpenCL

28
Introduction to OpenCL

description

1 - Introduction to OpenCL

Transcript of 1 - Introduction to OpenCL

Page 1: 1 - Introduction to OpenCL

Introduction to OpenCL

Page 2: 1 - Introduction to OpenCL

Module Overview

• Overview• OpenCL Architecture & Programming Model• Basic components for getting started• Information on tools

Page 3: 1 - Introduction to OpenCL

OVERVIEW

Page 4: 1 - Introduction to OpenCL

OpenCL

• OpenCL – Open Computing Language• Open Standard

– Royalty free, cross-platform, vendor neutral• Standard for accessing heterogeneous

computational resources– GPU, CPU, GPU+CPU or multiple GPUs

Page 5: 1 - Introduction to OpenCL

What is OpenCL : Processor Parallelism

CPUsMultiple cores driving

performance increases

GPUsIncreasingly general purpose data-parallel

computingImproving numerical

precision

Graphics APIs and Shading

Languages

Multi-processor programming – e.g. OpenMP

EmergingIntersection

OpenCLHeterogenous

Computing

OpenCL – Open Computing LanguageOpen, royalty-free standard for portable, parallel programming of heterogeneous

parallel computing CPUs, GPUs, and other processors

OpenCL – Open Computing LanguageOpen, royalty-free standard for portable, parallel programming of heterogeneous

parallel computing CPUs, GPUs, and other processors

Page 6: 1 - Introduction to OpenCL

Design Goals of OpenCL

• Use all computational resources in system– Program GPUs, CPUs, Cell, DSP and other processors as peers– Support both data- and task- parallel compute models

• Low-level, high-performance but portable– Primarily targeted at expert developers– Foundation for parallel computing ecosystem

• C-based programming model• Specify accuracy of floating-point computations

– IEEE 754 compliant rounding behavior– Define maximum allowable error of math functions

• Defines a configuration profile for handheld and embedded devices• Close integration with OpenGL and other 3D APIs

Page 7: 1 - Introduction to OpenCL

OpenCL

• Interface designed for graphics free API• Software Stack

– High level Language• “Extended C” to show parallelism

– Runtime libraries• Allows GPU memory management

Page 8: 1 - Introduction to OpenCL

How does it fit with vendor specific Architecture

Page 9: 1 - Introduction to OpenCL

OPENCL ARCHITECTURE & PROGRAMMING MODEL

Page 10: 1 - Introduction to OpenCL

OpenCL Platform Model

• One Host + one or more compute devices– Each Compute Device is composed of one or more Compute Units

• Each Compute Unit is further divided into one or more Processing Elements

Page 11: 1 - Introduction to OpenCL

OpenCL Platform Model

• Computations on a device occur within the processing elements

• An OpenCL application runs on a host and submits commands from the host to execute computations on the processing elements within a device

Page 12: 1 - Introduction to OpenCL

GPU as Co-processor

• GPU as Compute device– Has its own DRAM (Video memory)– Can run multiple threads in parallel

• Application runs on host• The compute intensive, data-parallel part is

sent to GPU– Written as C functions called kernel– The kernel is executed on device simultaneously

by multiple threads

Page 13: 1 - Introduction to OpenCL

Programming Model

Main Memory GPU Memory

Copy Input Data from Host to GPU Memory

Load/Initialize Input Data

Process InputData andWrite to output

Copy Output from GPU to Host Memory

FireStreamOpteron

Host application GPU kernel

Page 14: 1 - Introduction to OpenCL

Implicit Data Parallelism

Cvoid sum(float A[],

float B[],

float C[])

{

for(int i = 0; i < n; i++)

{

C[i] = A[i] +

B[i];

}

}

C - Rewrittenfloat sum_kernel(int x, float

A[], float B[])

{

return A[x] + B[x];

}

void sum(float A[],

float B[],

float C[])

{

for(int i = 0; i < n; i++)

C[j][i] =

sum_kernel(i, A, B);

}

Page 15: 1 - Introduction to OpenCL

Implicit Data Parallelism

C – Rewritten 2void sum(float A[],

float B[],

float C[])

{

for(int i = 0; i < n; i++)

launch_thread(C[i] =

sum_kernel(i, A, B));

sync_threads();

}

float sum_kernel(int x, float A[][], float B[][])

{

return A[x] + B[x];

}

OpenCL// Kernel definition__kernel void vecAdd(__global float* A,

__global float* B, __global float* C){ int i = get_local_id(0); C[i] = A[i] + B[i];}

int main(){ // Kernel invocation size_t globalWorkSize[] = {n}; size_t localWorkSize[] = {n}; clEnqueueNDRangeKernel(..,1, NULL,

globalWorkSize, localWorkSize, 0, NULL,NULL);

}

Kernel invocation from host•Number of OpenCL threads

Page 16: 1 - Introduction to OpenCL

Kernel

• Each thread has a unique thread ID__kernel void vecAdd(__global float* A, __global float* B, __global float* C){ int i = get_local_id(0); C[i] = A[i] + B[i];}

Unique Thread ID• Accessible within the kernel through intrinsic function

Function Qualifier•“__kernel” qualifier declares a function as a Kernel

Page 17: 1 - Introduction to OpenCL

Work-Group

• Work-items are organized into work-groups

• Group can be a 1D, 2D or 3D array of work-items– Specified during kernel invocation– Helpful to invoke kernels on

Matrices, fields– Each work-item within a group can

be identified by a 1D, 2D or 3D id• Built-in function get_local_id()

Work-Group

WI(0, 1)

WI(1, 1)

WI(2, 1)

WI(3, 1)

WI(4, 1)

WI(0, 2)

WI(1, 2)

WI(2, 2)

WI(3, 2)

WI(4, 2)

WI(0, 0)

WI(1, 0)

WI(2, 0)

WI(3, 0)

WI(4, 0)

Page 18: 1 - Introduction to OpenCL

Work-Group

• Example of 2D work-group// Add two matrices A and B of dimension NxN and store the// result into C__kernel void matAdd(int N, __global float* A, __global float* B, __global float* C){ int i = get_local_id(0); int j = get_local_id(1); C[j * N + i] = A[j * N + i] + B[j * N + i];}

// host codeint main(){ // Declare, allocate and initialize device memory A, B & C

// Kernel invocation size_t globalWorkSize[] = {N, N}; size_t localWorkSize[] = {N, N}; clEnqueueNDRangeKernel(.., 1, NULL, globalWorkSize, localWorkSize, 0, NULL, NULL);}

Page 19: 1 - Introduction to OpenCL

An N-dimension domain of work-items

• Global Dimensions: 1024 x 1024 (whole problem space) • Local Dimensions: 128 x 128 (executed together)• Choose the dimensions that are “best” for your algorithm

Page 20: 1 - Introduction to OpenCL

Example Problem Dimensions

• 1D: 1 million elements in an array:– global_dim[3] = {1000000, 1, 1};

• 2D: 1920 x 1200 HD video frame, 2.3M pixels:– global_dim[3] = {1920, 1200, 1};

• 3D: 256 x 256 x 256 volume, 16.7M voxels:– global_dim[3] = {256, 256, 256};

• Choose the dimensions that are “best” for your algorithm

– Maps well– Performs well

Page 21: 1 - Introduction to OpenCL

BASIC COMPONENTS FOR GETTING STARTED

Page 22: 1 - Introduction to OpenCL

Basic OpenCL Program Structure

• Kernels– C code with some restrictions and extensions

• Host program– Query compute devices– Create contexts– Create memory objects associated to contexts– Compile and create kernel program objects– Issue commands to command-queue– Synchronization of commands– Clean up OpenCL resources

Language

Platform Layer

Runtime

Page 23: 1 - Introduction to OpenCL

Typical OpenCL Program

• Computation intensive, data parallel function written as kernel

• Host side code– Context Creation– Allocate memory on device– Host to Device Data transfer– Compilation and creation of kernel program objects– Bind memory objects to kernel arguments– Call a kernel function to be executed on device– Read-back result data from device

Page 24: 1 - Introduction to OpenCL

INFORMATION ON TOOLS

Page 25: 1 - Introduction to OpenCL

OpenCL Implementation

• AMD’s implementation– Ships with ATI Stream SDK v2.0– Released on: 21th Dec, 2009

• Requires ATI GPU >= RV7XX

Page 26: 1 - Introduction to OpenCL

OpenCL Installation

• ATI Stream SDK– Environment variable

• $(ATISTREAMSDKROOT) = ATI Stream SDK installation directory

• $(ATISTREAMSDKSAMPLESROOT) = ATI Stream SDK Samples installation directory

Page 27: 1 - Introduction to OpenCL

ATI OpenCL SDK

• Header files– cl.h, cl_gl.h, cl_platform.h under$(ATISTREAMSDKROOT)\include\CL

• Library files– OpenCL.lib under $(ATISTREAMSDKROOT)\lib\x86

• Dynamic Link Library– OpenCL.dll under$(ATISTREAMSDKROOT)\bin\x86– Make sure Path contains this directory

Page 28: 1 - Introduction to OpenCL

Recap and Q&A

• Overview & Programming model• Basic components for getting started• Information on tools