PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos
-
Upload
amd-developer-central -
Category
Technology
-
view
898 -
download
3
description
Transcript of PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos
![Page 1: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/1.jpg)
Software Librariesfor CUDA & OpenCL
![Page 2: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/2.jpg)
Heterogeneous Computing is Hard
Two Examples:
1. Median Filtering
2. Local Windowing
![Page 3: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/3.jpg)
Median Filtering
Increasingly
Difficult
![Page 4: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/4.jpg)
Local Windowing
Best algorithm to use changes given which
device is in the system.
Device 1 Device 2 Device 3 Device 4
Algorithm 1 395 ms 599 244 102
Algorihm 2 270 703 241 103
Algorithm 3 699 407 138 116
Algorithm 4 380 522 202 98
![Page 5: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/5.jpg)
Why Software Libraries Are Great
Reduce many lines of code to one line
Obsessively tuned by experts; faster than DIY
Well-tested and maintained
Continuously improving
![Page 6: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/6.jpg)
Five Influencers (besides price)
Portability Scalability Community
ProgrammabilityPerformance
![Page 7: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/7.jpg)
Faster
Time-
consuming
SSE or
AVXSlower
Easy-to-use
Performance & Programmability
![Page 8: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/8.jpg)
Faster
Time-
consuming
Writing
Kernels
SSE or
AVXSlower
Easy-to-use
Performance & Programmability
![Page 9: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/9.jpg)
Faster
Time-
consuming
Writing
Kernels
Compiler
Directives
SSE or
AVXSlower
Easy-to-use
Performance & Programmability
![Page 10: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/10.jpg)
Faster
Time-
consuming
Writing
Kernels
Using
Libraries
Compiler
Directives
SSE or
AVXSlower
Easy-to-use
Performance & Programmability
![Page 11: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/11.jpg)
Performance
![Page 12: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/12.jpg)
Performance
![Page 13: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/13.jpg)
Portability
Flavors of portability
HW vendor options
Accelerator options (GPU, coprocessor, FPGA)
CPU fallback
High-performance mobile computing
Libraries can provide portability
![Page 14: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/14.jpg)
Scalability
Always start with one device
Potential headaches of adding devices
Performance hit
Development complexity
Libraries can make scaling easy
![Page 15: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/15.jpg)
Community
What do you do when bugs arise?
Continuous refinement
Someone to answer questions
Libraries can have great community support
![Page 16: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/16.jpg)
Benefits of Using a Library
Development
Documentation
Test and QA
Maintenance
Porting
TIM
E
COST
TIM
E
COST
Libraries eliminate
hidden costs of software
development
Pain Pleasure
![Page 17: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/17.jpg)
ArrayFire: Technical Computing
![Page 18: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/18.jpg)
Performance & Programmability
Super easy to program
Highly optimized
![Page 19: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/19.jpg)
Portability
![Page 20: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/20.jpg)
Scalability
Multi-GPU is 1-line of code
array *y = new array[n];
for (int i = 0; i < n; ++i) {
deviceset(i); // change GPUs
array x = randu(5,5); // add work to GPU’s queue
y[i] = fft(x); // more work in queue
}
// all GPUs are now computing simultaneously
![Page 21: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/21.jpg)
Community
Over 8,000 posts at
http://forums.accelereyes.com
Nightly library update releases
Stable releases a few times a year
v2.0 coming at the end of summer
![Page 22: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/22.jpg)
Example Case Studies 1
45X
Radar Imaging
System Planning
17X
Neuro-imaging
Georgia Tech
20X
Video Processing
12X
Medical Devices
Spencer Tech
20X
Viral Analyses
CDC
![Page 23: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/23.jpg)
Example Case Studies 2
70X
Drug Delivery
Georgia Tech
5X
Weather Models
NCAR
17X
Surveillance
BAE Systems
35X
Bioinformatics
Leibnitz
35X
Power Eng
IIT India
![Page 24: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/24.jpg)
Hundreds of Functions
reductions
• sum, min, max, count,
prod
• vectors, columns,
rows, etc
convolutions
• 2D, 3D, ND
dense linear algebra
• LU, QR, Cholesky, SVD,
Eigenvalues, Inversion,
Solvers, Determinant,
Matrix Power
FFTs
• 2D, 3D, ND
image processing
• filter, rotate, erode,
dilate, morph,
resize, rgb2gray,
histograms
interpolate & scale
• vectors, matrices
• rescaling
sorting
• along any
dimension
• sort detection
and many more…
![Page 25: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/25.jpg)
Intuitive Functions (estimate π)
#include <stdio.h>
#include <arrayfire.h>
using namespace af;
int main() {
// 20 million random samples
int n = 20e6;
array x = randu(n,1), y = randu(n,1);
// how many fell inside unit circle?
float pi = 4 * sum<float>(x*x + y*y < 1) / n;
printf("pi = %g\n", pi);
return 0;
}
![Page 26: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/26.jpg)
Data Types
c32complex
single precision
f64real
double precision
f32real
single precision
c64complex
double precision
b8boolean byte
arraycontainer object
s32signed integer
u32unsigned integer
array x = randu(n, f32);
array y = randu(n, f64);
array z = randu(n, u32);
![Page 27: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/27.jpg)
ND Support
vectors
matrices volumes… ND
![Page 28: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/28.jpg)
Subscripting
A(span,span,2)
ArrayFire Keywords: end, span
A(end,span)
A(1,span)A(1,1)
A(end,1)
![Page 29: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/29.jpg)
Generate Arrays
constant(0,3) // 3-by-1 column of zeros, single-precision
constant(1,3,2,f64) // 3-by-2 matrix, double-precision
randu(1,8) // row vector (1x8) of random values (uniform)
randn(2,2) // square matrix (2x2) random values (normal)
identity(3,3) // 3-by-3 identity
randu(5,7,c32) // complex random values
![Page 30: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/30.jpg)
Create Arrays from CPU Data
float hA[] = {0,1,2,3,4,5};
array A(2,3,hA); // 2x3 matrix, single-precision
print(A);
// A = [ 0 2 4 ] Note: Fortran storage order
// [ 1 3 5 ]
![Page 31: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/31.jpg)
Arithmetic
array R = randu(3,3);
array C = constant(1,3,3) + complex(sin(R)); // C is c32
// rescale complex values to unit circle
array a = randn(5,c32);
print(a / abs(a));
![Page 32: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/32.jpg)
L-2 Norm Example
// calculate L-2 norm of every column
sqrt(sum(pow(X, 2))) // norm of every column vector
sqrt(sum(pow(X, 2), 0)) // ..same
sqrt(sum(pow(X, 2), 1)) // norm of every row vector
![Page 33: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/33.jpg)
Subscripting Examples
array A = randu(3,3);
array a1 = A(0); // first element
array a2 = A(0,1); // first row, second column
A(1,span); // second row
A.row(end); // last row
A.cols(1,end); // all but first column
![Page 34: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/34.jpg)
Subscripting Examples
float b_ptr[] = {0,1,2,3,4,5,6,7,8,9};
array b(1,10,b_ptr);
b(seq(3)); // {0,1,2}
b(seq(1,7)); // {1,2,3,4,5,6,7}
b(seq(1,2,7)); // {1,3,5,7}
b(seq(0,2,end)); // {0,2,4,6,8}
![Page 35: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/35.jpg)
Data Manipulation
// setting entries to a constant
A(span) = 4; // fill entire array
A.row(0) = -1; // first row
A(seq(3)) = 3.1415; // first three elements
![Page 36: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/36.jpg)
Data Manipulation
// copy in another matrix
array B = constant(1,4,4,f64);
B.row(0) = randu(1,4,f32); // set row (upcast)
![Page 37: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/37.jpg)
Data Manipulation
// index with another array
float h_inds[] = {0, 4, 2, 1}; // zero-based
array inds(1,4,h_inds);
B(inds) = randu(4,1); // set to random
![Page 38: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/38.jpg)
Linear Algebra
// matrix factorization
array L, U;
lu(L, U, randu(n,n));
// linear systems: A x = b
array A = randu(n,n), b = randu(n,1);
array x = solve(A,b);
![Page 39: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/39.jpg)
Graphics Functions
asynchronous
non-blocking
throttled at 35 Hz
![Page 40: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/40.jpg)
Graphics Functions
non-blocking primitives
surface - surface plotting (2d data)
image - intensity image visualization
arrows - vector fields
plot2 - line plotting (x,y)
plot3 - scatter plot (x,y,z)
volume - volume rendering for 3d data
![Page 41: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/41.jpg)
Graphics Functions
utility commands
keep_on keep_off
subfigure
palette
clearfig
draw (blocking)
figure
title
close
![Page 42: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/42.jpg)
Graphics Example
#include <arrayfire.h>
using namespace af;
int main() {
// random 3d surface
const int n = 256;
while (1) {
array x = randu(n,n);
// 3d surface plot
surface(x);
}
return 0;
}
![Page 43: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/43.jpg)
GFOR Parallel Loops
gfor (array i, 3)
C(span,span,i) = A(span,span,i) * B;
Parallel matrix multiplications (1 kernel launch)
C(,,1) A(,,1) B
*=
C(,,3) A(,,3) B
*=
C(,,2) A(,,2) B
*=
![Page 44: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/44.jpg)
GFOR Parallel Loops
BA(,,1:3)C(,,1:3)
*=*=
*=
gfor (array i, 3)
C(span,span,i) = A(span,span,i) * B;
Parallel matrix multiplications (1 kernel launch)
![Page 45: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/45.jpg)
GFOR Parallel Loops
gfor (array i, 3)
C(span,span,i) = A(span,span,i) * B;
Parallel matrix multiplications (1 kernel launch)
= *
BAC
![Page 46: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/46.jpg)
Four Quick Stories in Conclusion
Advertising Healthcare Finance Oil & Gas
![Page 47: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/47.jpg)
Virtual Glasses Try-On
![Page 48: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/48.jpg)
Acceleration Demands
The CPU code
45 seconds for one session to complete
Highly optimized OpenMP code leveraging all cores
1,000 sessions/minute required 750 CPU nodes
Convert Mac-only research code to C#
Focus on efficiently developed robust performance
![Page 49: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/49.jpg)
ArrayFire Solution
Linear algebra
Matrix multiple, Transpose
Linear solvers
Image processing
Convolutions
Fast Fourier Transform
Correlation Filter
Sobel Filter
Gaussian Blur
OpenCV functions
Custom edge detection
Graphics
Rendering points
Reductions
Min, Max, Sum
JIT
Increased productivity
![Page 50: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/50.jpg)
Results
3X acceleration
Dropped from 750 nodes,
to 250 nodes
Benefit from ongoing
library support
![Page 51: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/51.jpg)
Culture-Free Microbiology
Filling
Filled
Computer-
controlled
pipettes
![Page 52: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/52.jpg)
Microscope
A computer-controlled microscope scans a
cassette of pipettes, changes imaging
modes, and acquires digital images
according to program
![Page 53: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/53.jpg)
Acceleration Demands
This platform provides a rapid alternative to traditional cell culturing for susceptibility testing
The faster the analysis pipeline, the sooner a patient can be diagnosed and treated with an antibiotic
Culture-based methods can take 2-3 days, which is problematic for many critically ill patients
![Page 54: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/54.jpg)
ArrayFire Solution
Image Processing
Heavily filter based
Convolve, Filter, Resize
Image Statistics
Mean, StdDev, Variance
![Page 55: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/55.jpg)
Results
Realtime throughputKernel Speedup
Image Registration (Heavy use of
statistics functions)
73.17x
Custom Filter (Prep Center Image) 26.48x
Gaussian Blur 2.19x
![Page 56: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/56.jpg)
Hedge Protection System
![Page 57: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/57.jpg)
Acceleration Demands
CPU-only version was taking 115 hours
Needs to run entire database of portfolios
each night before trading begins next day
![Page 58: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/58.jpg)
ArrayFire Solution
Statistics Functions
Random number
generation
Variance
Exponentials
Arithmetic
Sqrt
Element-wise math
Reductions
Sum
![Page 59: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/59.jpg)
Results
GPU version drops runtime to 7 hours and
meets the requirement to run overnight
Time left over to try more permutations
![Page 60: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/60.jpg)
Oil Well Monitoring
Ordinary telecom
fiber used as an
efficient, high fidelity
acoustic sensor
Threaded along the
length of oil well
![Page 61: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/61.jpg)
Acceleration Demands
Require realtime signal processing from 24
channels per unit with an onsite server
CPU-only solution was 5x slower than realtime
![Page 62: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/62.jpg)
ArrayFire Solution
Heavy usage of signal filtering functions
FIR
IIR
![Page 63: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/63.jpg)
Results
6x performance improvements in signal
processing
20x overall performance improvement
through more efficiently vectorized code
![Page 64: PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos](https://reader031.fdocuments.us/reader031/viewer/2022020122/54840358b4af9fbd5d8b45e6/html5/thumbnails/64.jpg)
Software Shop for CUDA & OpenCL
Two ways to work with us:
Use
Hire our CUDA & OpenCL developers
Code development; CUDA & OpenCL training