High Level OpenCL Implementation
-
Upload
chelsea-roth -
Category
Documents
-
view
26 -
download
0
description
Transcript of High Level OpenCL Implementation
![Page 1: High Level OpenCL Implementation](https://reader035.fdocuments.us/reader035/viewer/2022062321/5681324c550346895d98c2b4/html5/thumbnails/1.jpg)
By: Matthew RoyleSupervisor: Prof. Shaun Bangay
![Page 2: High Level OpenCL Implementation](https://reader035.fdocuments.us/reader035/viewer/2022062321/5681324c550346895d98c2b4/html5/thumbnails/2.jpg)
Multi-core CPUs Sequential algorithms to parallel
algorithms GPUs used for more than just graphics Use of GPGPUs (General-Purpose
Graphics Processing Unit)
![Page 3: High Level OpenCL Implementation](https://reader035.fdocuments.us/reader035/viewer/2022062321/5681324c550346895d98c2b4/html5/thumbnails/3.jpg)
Parallel programming languages for
specific architectures, namely NVIDIA’s
CUDA Lack of a multi-platform open language The OpenCL (Open Computing Language)
standard Heterogenous Parallel Programming
![Page 4: High Level OpenCL Implementation](https://reader035.fdocuments.us/reader035/viewer/2022062321/5681324c550346895d98c2b4/html5/thumbnails/4.jpg)
Parallel nature of GPUs No Implementation Implement OpenCL using existing
technologies
High level translator Use Parallel Frameworks
![Page 5: High Level OpenCL Implementation](https://reader035.fdocuments.us/reader035/viewer/2022062321/5681324c550346895d98c2b4/html5/thumbnails/5.jpg)
![Page 6: High Level OpenCL Implementation](https://reader035.fdocuments.us/reader035/viewer/2022062321/5681324c550346895d98c2b4/html5/thumbnails/6.jpg)
GPU most likely form of implementation
NVIDIA and AMD plan to include OpenCL
Future Apple iPhones
Lack of implementation on CPU
architecture
![Page 7: High Level OpenCL Implementation](https://reader035.fdocuments.us/reader035/viewer/2022062321/5681324c550346895d98c2b4/html5/thumbnails/7.jpg)
Select a parallel processing framework
Create a high level translator Create valid tests Run created tests
![Page 8: High Level OpenCL Implementation](https://reader035.fdocuments.us/reader035/viewer/2022062321/5681324c550346895d98c2b4/html5/thumbnails/8.jpg)
_kernel int add_vect (); //create computation unit
cl_cmd_queue cmd_queue = CreateCommandQueue(); //create computation queue
clEnqueueTask(kernel,i); //enqueue task and execute
![Page 9: High Level OpenCL Implementation](https://reader035.fdocuments.us/reader035/viewer/2022062321/5681324c550346895d98c2b4/html5/thumbnails/9.jpg)
cl_cmd_queue CreateCommandQueue(){ return cmd_queue[]; }
void clEnqueueTask(kernel,i) { cmd_queue[i] = kernel; }
#pragma omp parallel for{for(int k = 0; k < cmd_queue.length; k++)
Execute(cmd_queue[k]);}
![Page 10: High Level OpenCL Implementation](https://reader035.fdocuments.us/reader035/viewer/2022062321/5681324c550346895d98c2b4/html5/thumbnails/10.jpg)
John Conway’s Game Of Life
Fractal Flame algorithm
![Page 11: High Level OpenCL Implementation](https://reader035.fdocuments.us/reader035/viewer/2022062321/5681324c550346895d98c2b4/html5/thumbnails/11.jpg)
OpenMP (Open Multi-Processing) framework
Parallel Processing Framework
Available with the GNU Compiler
Collection Free! OpenCL header files
![Page 12: High Level OpenCL Implementation](https://reader035.fdocuments.us/reader035/viewer/2022062321/5681324c550346895d98c2b4/html5/thumbnails/12.jpg)
/* scalar types */
typedef int8_t cl_char;
typedef uint8_t cl_uchar;
typedef int16_t cl_short __attribute__((aligned(2)));
typedef uint16_t cl_ushort __attribute__((aligned(2)));
typedef int32_t cl_int __attribute__((aligned(4)));
typedef uint32_t cl_uint __attribute__((aligned(4)));
typedef int64_t cl_long __attribute__((aligned(8)));
typedef uint64_t cl_ulong __attribute__((aligned(8)));
typedef uint16_t cl_half __attribute__((aligned(2)));
typedef float cl_float __attribute__((aligned(4)));
typedef double cl_double __attribute__((aligned(8)));
![Page 13: High Level OpenCL Implementation](https://reader035.fdocuments.us/reader035/viewer/2022062321/5681324c550346895d98c2b4/html5/thumbnails/13.jpg)
//hello.c
#include <omp.h>#include <stdio.h>int main() {#pragma omp parallel num_threads(10)printf("Hello from thread %d, nthreads %d\n",
omp_get_thread_num(), omp_get_num_threads());}
![Page 14: High Level OpenCL Implementation](https://reader035.fdocuments.us/reader035/viewer/2022062321/5681324c550346895d98c2b4/html5/thumbnails/14.jpg)
![Page 15: High Level OpenCL Implementation](https://reader035.fdocuments.us/reader035/viewer/2022062321/5681324c550346895d98c2b4/html5/thumbnails/15.jpg)
Improve performance
Evaluation of OpenCL on various
Architectures
Heterogenous execution
![Page 16: High Level OpenCL Implementation](https://reader035.fdocuments.us/reader035/viewer/2022062321/5681324c550346895d98c2b4/html5/thumbnails/16.jpg)
Lack of multi-platform open language
OpenCL standard
Most implementations for GPU
Implementation for CPU
High Level Translator
Use OpenMP framework