GPU Computing with MATLAB - Nvidia · PDF fileGPU Computing with MATLAB ... Desktop Computer...
Transcript of GPU Computing with MATLAB - Nvidia · PDF fileGPU Computing with MATLAB ... Desktop Computer...
![Page 1: GPU Computing with MATLAB - Nvidia · PDF fileGPU Computing with MATLAB ... Desktop Computer ... – Invoke CUDA kernels directly from MATLAB – No MEX programming necessary 10. 11](https://reader034.fdocuments.us/reader034/viewer/2022051405/5a79b08d7f8b9ae1468d8e1c/html5/thumbnails/1.jpg)
1
GPU Computing with MATLAB
Loren DeanDirector of Engineering, MATLAB ProductsMathWorks
![Page 2: GPU Computing with MATLAB - Nvidia · PDF fileGPU Computing with MATLAB ... Desktop Computer ... – Invoke CUDA kernels directly from MATLAB – No MEX programming necessary 10. 11](https://reader034.fdocuments.us/reader034/viewer/2022051405/5a79b08d7f8b9ae1468d8e1c/html5/thumbnails/2.jpg)
2
Spectrogram shows 50x speedup in a GPU cluster
50x
![Page 3: GPU Computing with MATLAB - Nvidia · PDF fileGPU Computing with MATLAB ... Desktop Computer ... – Invoke CUDA kernels directly from MATLAB – No MEX programming necessary 10. 11](https://reader034.fdocuments.us/reader034/viewer/2022051405/5a79b08d7f8b9ae1468d8e1c/html5/thumbnails/3.jpg)
3
Agenda
Background Leveraging the desktop
– Basic GPU capabilities – Multiple GPUs on a single machine
Moving to the cluster– Multiple GPUs on multiple machines
Q&A
![Page 4: GPU Computing with MATLAB - Nvidia · PDF fileGPU Computing with MATLAB ... Desktop Computer ... – Invoke CUDA kernels directly from MATLAB – No MEX programming necessary 10. 11](https://reader034.fdocuments.us/reader034/viewer/2022051405/5a79b08d7f8b9ae1468d8e1c/html5/thumbnails/4.jpg)
4
How many people are using…
MATLAB MATLAB with GPUs Parallel Computing Toolbox
– R2010b prerelease
MATLAB Distributed Computing Server
![Page 5: GPU Computing with MATLAB - Nvidia · PDF fileGPU Computing with MATLAB ... Desktop Computer ... – Invoke CUDA kernels directly from MATLAB – No MEX programming necessary 10. 11](https://reader034.fdocuments.us/reader034/viewer/2022051405/5a79b08d7f8b9ae1468d8e1c/html5/thumbnails/5.jpg)
5
Why GPUs and why now?
Operations are IEEE Compliant Cross-platform support now available Single/double performance inline with
expectations
![Page 6: GPU Computing with MATLAB - Nvidia · PDF fileGPU Computing with MATLAB ... Desktop Computer ... – Invoke CUDA kernels directly from MATLAB – No MEX programming necessary 10. 11](https://reader034.fdocuments.us/reader034/viewer/2022051405/5a79b08d7f8b9ae1468d8e1c/html5/thumbnails/6.jpg)
6
Desktop Computer
Parallel Computing Toolbox™Parallel Computing Toolbox™
Computer ClusterComputer Cluster
MATLAB Distributed Computing Server™MATLAB Distributed Computing Server™
Scheduler
Parallel Computing with MATLABTools and Terminology
![Page 7: GPU Computing with MATLAB - Nvidia · PDF fileGPU Computing with MATLAB ... Desktop Computer ... – Invoke CUDA kernels directly from MATLAB – No MEX programming necessary 10. 11](https://reader034.fdocuments.us/reader034/viewer/2022051405/5a79b08d7f8b9ae1468d8e1c/html5/thumbnails/7.jpg)
7
Parallel CapabilitiesTask Parallel Data Parallel Environment
Built-in support with Simulink, toolboxes, and blocksets
matlabpool
Local workers
parfordistributed array
>200 functions
Configurations
batch
MathWorks job manager
job/task
spmd
co-distributed array
MPI interface
third-party schedulers
job/task
Ease
of U
se
Greater C
ontrol
![Page 8: GPU Computing with MATLAB - Nvidia · PDF fileGPU Computing with MATLAB ... Desktop Computer ... – Invoke CUDA kernels directly from MATLAB – No MEX programming necessary 10. 11](https://reader034.fdocuments.us/reader034/viewer/2022051405/5a79b08d7f8b9ae1468d8e1c/html5/thumbnails/8.jpg)
8
Single processor
Multicore Multiprocessor ClusterGrid,Cloud
GPU
Evolving With Technology Changes
![Page 9: GPU Computing with MATLAB - Nvidia · PDF fileGPU Computing with MATLAB ... Desktop Computer ... – Invoke CUDA kernels directly from MATLAB – No MEX programming necessary 10. 11](https://reader034.fdocuments.us/reader034/viewer/2022051405/5a79b08d7f8b9ae1468d8e1c/html5/thumbnails/9.jpg)
9
What’s new in R2010b?
Parallel Computing Toolbox– GPU support– Broader algorithm support (QR, rectangular \)
MATLAB Distributed Computing Server– GPU support– Run as user with MathWorks job manager– Non-shared file system support
Simulink®
– Real-Time Workshop® support with PCT and MDCS
![Page 10: GPU Computing with MATLAB - Nvidia · PDF fileGPU Computing with MATLAB ... Desktop Computer ... – Invoke CUDA kernels directly from MATLAB – No MEX programming necessary 10. 11](https://reader034.fdocuments.us/reader034/viewer/2022051405/5a79b08d7f8b9ae1468d8e1c/html5/thumbnails/10.jpg)
10
GPU Functionality
Call GPU(s) from MATLAB or toolbox/server worker Support for CUDA 1.3 enabled devices GPU array data type
– Store arrays in GPU device memory– Algorithm support for over 100 functions– Integer and double support
GPU functions– Invoke element-wise MATLAB functions on the GPU
CUDA kernel interface– Invoke CUDA kernels directly from MATLAB– No MEX programming necessary
10
![Page 11: GPU Computing with MATLAB - Nvidia · PDF fileGPU Computing with MATLAB ... Desktop Computer ... – Invoke CUDA kernels directly from MATLAB – No MEX programming necessary 10. 11](https://reader034.fdocuments.us/reader034/viewer/2022051405/5a79b08d7f8b9ae1468d8e1c/html5/thumbnails/11.jpg)
11
Demo hardware
![Page 12: GPU Computing with MATLAB - Nvidia · PDF fileGPU Computing with MATLAB ... Desktop Computer ... – Invoke CUDA kernels directly from MATLAB – No MEX programming necessary 10. 11](https://reader034.fdocuments.us/reader034/viewer/2022051405/5a79b08d7f8b9ae1468d8e1c/html5/thumbnails/12.jpg)
12
Example:
GPU Arrays
>> A = someArray(1000, 1000);>> G = gpuArray(A); % Push to GPU memory…>> F = fft(G); >> x = G\b; …>> z = gather(x); % Bring back into MATLAB
12
![Page 13: GPU Computing with MATLAB - Nvidia · PDF fileGPU Computing with MATLAB ... Desktop Computer ... – Invoke CUDA kernels directly from MATLAB – No MEX programming necessary 10. 11](https://reader034.fdocuments.us/reader034/viewer/2022051405/5a79b08d7f8b9ae1468d8e1c/html5/thumbnails/13.jpg)
13
GPUArray Function Support
>100 functions supported– fft, fft2, ifft, ifft2
– Matrix multiplication (A*B)– Matrix left division (A\b)– LU factorization– ‘ .’
– abs, acos, …, minus, …, plus, …, sin, …
Key functions not supported– conv, conv2, filter
– indexing
13
![Page 14: GPU Computing with MATLAB - Nvidia · PDF fileGPU Computing with MATLAB ... Desktop Computer ... – Invoke CUDA kernels directly from MATLAB – No MEX programming necessary 10. 11](https://reader034.fdocuments.us/reader034/viewer/2022051405/5a79b08d7f8b9ae1468d8e1c/html5/thumbnails/14.jpg)
14
GPU Array benchmarks
* Results in Gflops, matrix size 8192x8192. Limited by card memory. Computational capabilities not saturated.
A\b* TeslaC1060
TeslaC2050 (Fermi)
Quad‐core Intel CPU
Ratio (Fermi:CPU)
Single 191 250 48 5:1Double 63.1 128 25 5:1
Ratio 3:1 2:1 2:1
![Page 15: GPU Computing with MATLAB - Nvidia · PDF fileGPU Computing with MATLAB ... Desktop Computer ... – Invoke CUDA kernels directly from MATLAB – No MEX programming necessary 10. 11](https://reader034.fdocuments.us/reader034/viewer/2022051405/5a79b08d7f8b9ae1468d8e1c/html5/thumbnails/15.jpg)
15
GPU Array benchmarks
MTIMES TeslaC1060
TeslaC2050 (Fermi)
Quad‐core Intel CPU
Ratio (Fermi:CPU)
Single 365 409 59 7:1Double 75 175 29 6:1
Ratio 4.8:1 2.3:1 2:1
FFTTeslaC1060
TeslaC2050 (Fermi)
Quad‐core Intel CPU
Ratio (Fermi:CPU)
Single 50 99 2.29 43:1Double 22.5 44 1.47 30:1
Ratio 2.2:1 2.2:1 1.5:1
![Page 16: GPU Computing with MATLAB - Nvidia · PDF fileGPU Computing with MATLAB ... Desktop Computer ... – Invoke CUDA kernels directly from MATLAB – No MEX programming necessary 10. 11](https://reader034.fdocuments.us/reader034/viewer/2022051405/5a79b08d7f8b9ae1468d8e1c/html5/thumbnails/16.jpg)
16
Example:
arrayfun: Element-Wise Operations
>> y = arrayfun(@foo, x); % Execute on GPU
16
function y = foo(x)y = 1 + x.*(1 + x.*(1 + x.*(1 + ...x.*(1 + x.*(1 + x.*(1 + x.*(1 + ...x.*(1 + x./9)./8)./7)./6)./5)./4)./3)./2);
![Page 17: GPU Computing with MATLAB - Nvidia · PDF fileGPU Computing with MATLAB ... Desktop Computer ... – Invoke CUDA kernels directly from MATLAB – No MEX programming necessary 10. 11](https://reader034.fdocuments.us/reader034/viewer/2022051405/5a79b08d7f8b9ae1468d8e1c/html5/thumbnails/17.jpg)
17
Some arrayfun benchmarks
CPU[4] = multhithreading enabledCPU[1] = multhithreading disabled
Note: Due to memory constraints, a different approach is used at N=16 and above.
![Page 18: GPU Computing with MATLAB - Nvidia · PDF fileGPU Computing with MATLAB ... Desktop Computer ... – Invoke CUDA kernels directly from MATLAB – No MEX programming necessary 10. 11](https://reader034.fdocuments.us/reader034/viewer/2022051405/5a79b08d7f8b9ae1468d8e1c/html5/thumbnails/18.jpg)
18
Example:
Invoking CUDA Kernels
% Setupkern = parallel.gpu.CUDAKernel(‘myKern.ptx’, cFcnSig)
% Configurekern.ThreadBlockSize=[512 1];kern.GridSize=[1024 1024];
% Run[c, d] = feval(kern, a, b);
18
![Page 19: GPU Computing with MATLAB - Nvidia · PDF fileGPU Computing with MATLAB ... Desktop Computer ... – Invoke CUDA kernels directly from MATLAB – No MEX programming necessary 10. 11](https://reader034.fdocuments.us/reader034/viewer/2022051405/5a79b08d7f8b9ae1468d8e1c/html5/thumbnails/19.jpg)
19
Options for scaling up
Leverage matlabpool– Enables desktop or cluster cleanly– Can be done either interactively or in batch
Decide how to manage your data– Task parallel
Use parfor Same operation with different inputs No interdependencies between operations
– Data parallel Use spmd Allows for interdependency between operations at the CPU level
![Page 20: GPU Computing with MATLAB - Nvidia · PDF fileGPU Computing with MATLAB ... Desktop Computer ... – Invoke CUDA kernels directly from MATLAB – No MEX programming necessary 10. 11](https://reader034.fdocuments.us/reader034/viewer/2022051405/5a79b08d7f8b9ae1468d8e1c/html5/thumbnails/20.jpg)
20
Worker Worker
Worker
Worker
WorkerWorker
Worker
WorkerTOOLBOXES
BLOCKSETS
MATLAB Pool Extends Desktop MATLAB
![Page 21: GPU Computing with MATLAB - Nvidia · PDF fileGPU Computing with MATLAB ... Desktop Computer ... – Invoke CUDA kernels directly from MATLAB – No MEX programming necessary 10. 11](https://reader034.fdocuments.us/reader034/viewer/2022051405/5a79b08d7f8b9ae1468d8e1c/html5/thumbnails/21.jpg)
21
Example:
Spectrogram on the desktop(CPU only)
D = data;iterations = 2000; % # of parallel iterationsstride = iterations*step; %stride of outer loop
M = ceil((numel(x)-W)/stride);%iterations neededo = cell(M, 1); % preallocate output
for i = 1:M% What are the start pointsthisSP = (i-1)*stride:step: …
(min(numel(x)-W, i*stride)-1);
% Move the data efficiently into a matrix X = copyAndWindowInput(D, window, thisSP);
% Take lots of fft's down the colmunsX = abs(fft(X));
% Return only the first part to MATLABo{i} = X(1:E, 1:ratio:end);
end
![Page 22: GPU Computing with MATLAB - Nvidia · PDF fileGPU Computing with MATLAB ... Desktop Computer ... – Invoke CUDA kernels directly from MATLAB – No MEX programming necessary 10. 11](https://reader034.fdocuments.us/reader034/viewer/2022051405/5a79b08d7f8b9ae1468d8e1c/html5/thumbnails/22.jpg)
22
Example:
Spectrogram on the desktop (CPU to GPU)
D = data;iterations = 2000; % # of parallel iterationsstride = iterations*step; %stride of outer loop
M = ceil((numel(x)-W)/stride);%iterations neededo = cell(M, 1); % preallocate output
for i = 1:M% What are the start pointsthisSP = (i-1)*stride:step: …
(min(numel(x)-W, i*stride)-1);
% Move the data efficiently into a matrix X = copyAndWindowInput(D, window, thisSP);
% Take lots of fft's down the colmunsX = abs(fft(X));
% Return only the first part to MATLABo{i} = X(1:E, 1:ratio:end);
end
D = gpuArray(data);iterations = 2000; % # of parallel iterationsstride = iterations*step; %stride of outer loop
M = ceil((numel(x)-W)/stride);%iterations neededo = cell(M, 1); % preallocate output
for i = 1:M% What are the start pointsthisSP = (i-1)*stride:step: ...
(min(numel(D)-W, i*stride)-1);
% Move the data efficiently into a matrix X = copyAndWindowInput(D, window, thisSP);
% Take lots of fft's down the colmunsX = gather(abs(fft(X)));
% Return only the first part to MATLABo{i} = X(1:E, 1:ratio:end);
end
![Page 23: GPU Computing with MATLAB - Nvidia · PDF fileGPU Computing with MATLAB ... Desktop Computer ... – Invoke CUDA kernels directly from MATLAB – No MEX programming necessary 10. 11](https://reader034.fdocuments.us/reader034/viewer/2022051405/5a79b08d7f8b9ae1468d8e1c/html5/thumbnails/23.jpg)
23
Example:
Spectrogram on the desktop (GPU to parallel GPU)
D = gpuArray(data);iterations = 2000; % # of parallel iterationsstride = iterations*step; %stride of outer loop
M = ceil((numel(x)-W)/stride);%iterations neededo = cell(M, 1); % preallocate output
for i = 1:M% What are the start pointsthisSP = (i-1)*stride:step: ...
(min(numel(D)-W, i*stride)-1);
% Move the data efficiently into a matrix X = copyAndWindowInput(D, window, thisSP);
% Take lots of fft's down the colmunsX = gather(abs(fft(X)));
% Return only the first part to MATLABo{i} = X(1:E, 1:ratio:end);
end
D = gpuArray(data);iterations = 2000; % # of parallel iterationsstride = iterations*step; %stride of outer loop
M = ceil((numel(x)-W)/stride);%iterations neededo = cell(M, 1); % preallocate output
parfor i = 1:M% What are the start pointsthisSP = (i-1)*stride:step: ...
(min(numel(D)-W, i*stride)-1);
% Move the data efficiently into a matrix X = copyAndWindowInput(D, window, thisSP);
% Take lots of fft's down the colmunsX = gather(abs(fft(X)));
% Return only the first part to MATLABo{i} = X(1:E, 1:ratio:end);
end
![Page 24: GPU Computing with MATLAB - Nvidia · PDF fileGPU Computing with MATLAB ... Desktop Computer ... – Invoke CUDA kernels directly from MATLAB – No MEX programming necessary 10. 11](https://reader034.fdocuments.us/reader034/viewer/2022051405/5a79b08d7f8b9ae1468d8e1c/html5/thumbnails/24.jpg)
24
Spectrogram shows 50x speedup in a GPU cluster
50x
![Page 25: GPU Computing with MATLAB - Nvidia · PDF fileGPU Computing with MATLAB ... Desktop Computer ... – Invoke CUDA kernels directly from MATLAB – No MEX programming necessary 10. 11](https://reader034.fdocuments.us/reader034/viewer/2022051405/5a79b08d7f8b9ae1468d8e1c/html5/thumbnails/25.jpg)
25
But…Speedup is 5x with data transfer(Remember we have to transfer data of GigE and then the PCIe bus)
5x
![Page 26: GPU Computing with MATLAB - Nvidia · PDF fileGPU Computing with MATLAB ... Desktop Computer ... – Invoke CUDA kernels directly from MATLAB – No MEX programming necessary 10. 11](https://reader034.fdocuments.us/reader034/viewer/2022051405/5a79b08d7f8b9ae1468d8e1c/html5/thumbnails/26.jpg)
26
GPU can go well beyond the CPU
~ 6 seconds
CPU ~ 10-15 seconds
~16 seconds
65x
![Page 27: GPU Computing with MATLAB - Nvidia · PDF fileGPU Computing with MATLAB ... Desktop Computer ... – Invoke CUDA kernels directly from MATLAB – No MEX programming necessary 10. 11](https://reader034.fdocuments.us/reader034/viewer/2022051405/5a79b08d7f8b9ae1468d8e1c/html5/thumbnails/27.jpg)
27
Summary of Options for Targeting GPUs
1) Use GPU array interface with MATLAB built-in functions
2) Execute custom functions on elements of the GPU array
3) Create kernels from existing CUDA code and PTX files
Ease
of U
se
Greater C
ontrol
Across one or more GPUs on one or more machines:
![Page 28: GPU Computing with MATLAB - Nvidia · PDF fileGPU Computing with MATLAB ... Desktop Computer ... – Invoke CUDA kernels directly from MATLAB – No MEX programming necessary 10. 11](https://reader034.fdocuments.us/reader034/viewer/2022051405/5a79b08d7f8b9ae1468d8e1c/html5/thumbnails/28.jpg)
28
![Page 29: GPU Computing with MATLAB - Nvidia · PDF fileGPU Computing with MATLAB ... Desktop Computer ... – Invoke CUDA kernels directly from MATLAB – No MEX programming necessary 10. 11](https://reader034.fdocuments.us/reader034/viewer/2022051405/5a79b08d7f8b9ae1468d8e1c/html5/thumbnails/29.jpg)
29
What hardware is supported?
NVIDIA hardware meeting the CUDA 1.3 hardware spec.
A listing can be found at: http://www.nvidia.com/object/cuda_gpus.html
29
![Page 30: GPU Computing with MATLAB - Nvidia · PDF fileGPU Computing with MATLAB ... Desktop Computer ... – Invoke CUDA kernels directly from MATLAB – No MEX programming necessary 10. 11](https://reader034.fdocuments.us/reader034/viewer/2022051405/5a79b08d7f8b9ae1468d8e1c/html5/thumbnails/30.jpg)
30
How come function_xyz is not GPU-accelerated?
The accelerated functions available in this first release were gated by available resources.
We will add capabilities with coming releases based on requirements and feedback.
30
![Page 31: GPU Computing with MATLAB - Nvidia · PDF fileGPU Computing with MATLAB ... Desktop Computer ... – Invoke CUDA kernels directly from MATLAB – No MEX programming necessary 10. 11](https://reader034.fdocuments.us/reader034/viewer/2022051405/5a79b08d7f8b9ae1468d8e1c/html5/thumbnails/31.jpg)
31
Why did we adopt CUDA and not OpenCL?
CUDA has the only ecosystem with all of the libraries necessary for technical computing
31
![Page 32: GPU Computing with MATLAB - Nvidia · PDF fileGPU Computing with MATLAB ... Desktop Computer ... – Invoke CUDA kernels directly from MATLAB – No MEX programming necessary 10. 11](https://reader034.fdocuments.us/reader034/viewer/2022051405/5a79b08d7f8b9ae1468d8e1c/html5/thumbnails/32.jpg)
32
Why are CUDA 1.1 and CUDA 1.2 not supported?
As mentioned earlier, CUDA 1.3 offers the following capabilities that earlier releases of CUDA do not
– Support for doubles. The base data type in MATLAB is double.
– IEEE compliance. We want to insure we get the correct answer.
– Cross-platform support.
32
![Page 33: GPU Computing with MATLAB - Nvidia · PDF fileGPU Computing with MATLAB ... Desktop Computer ... – Invoke CUDA kernels directly from MATLAB – No MEX programming necessary 10. 11](https://reader034.fdocuments.us/reader034/viewer/2022051405/5a79b08d7f8b9ae1468d8e1c/html5/thumbnails/33.jpg)
33
What benchmarks are available?
Some benchmarks are available in the product and at www.mathworks.com/products/parallel-computing/
More will be added over time
33