HC20.24.250.CUDA Application Development Experience...Compute Q Acquire Data Compute FHd Find ρ...

[HKR HotChips-2007]

6-12 weeks

Cartesian Scan Data

Spiral scan data + Iterative recon:

Fast scan reduces artifacts, iterative reconstruction increases SNR.

Reconstruction requires a lot of computation.

Spiral Scan Data

Iterative

Reconstruction

Gridding

Courtesy of Keith Thulborn and Ian Atkinson, Center for MR Research, University of Illinois at Chicago

Compute Q

Acquire Data

Compute FHd

Find ρ

More than

99.5% of time

Haldar, et al, “Anatomically-constrained reconstruction from noisy data,” MR in Medicine.

Reconstruction of a 643 image used to

take days!

Performance: 128 GFLOPS

Time: 1.2 minutes

for (m = 0; m < M; m++) {

phi[m] = rPhi[m]*rPhi[m]

+ iPhi[m]*iPhi[m]

for (n = 0; n < N; n++) {

exp = 2*PI*(kx[m]*x[n] +

ky[m]*y[n] +

kz[m]*z[n])

rQ[n] += phi[m]*cos(exp)

iQ[n] += phi[m]*sin(exp)

for (m = 0; m < M; m++) {

phi[m] = rPhi[m]*rPhi[m] +

iPhi[m]*iPhi[m]

for (n = 0; n < N; n++) {

ky[m]*y[n] +

kz[m]*z[n])

for (m = 0; m < M; m++) {

phi[m] = rPhi[m]*rPhi[m] +

iPhi[m]*iPhi[m]

for (n = 0; n < N; n++) {

ky[m]*y[n] +

kz[m]*z[n])

for (n = 0; n < N; n++) {

for (m = 0; m < M; m++) {

+ iPhi[m]*iPhi[m]

ky[m]*y[n] +

kz[m]*z[n])

for (n = 0; n < N; n++) {

for (m = 0; m < M; m++) {

+ iPhi[m]*iPhi[m]

ky[m]*y[n] +

kz[m]*z[n])

for (m = 0; m < M; m++) {

+ iPhi[m]*iPhi[m]

for (n = 0; n < N; n++) {

for (m = 0; m < M; m++) {

ky[m]*y[n] +

kz[m]*z[n])

for (m = 0; m < M; m++) {

+ iPhi[m]*iPhi[m]

for (n = 0; n < N; n++) {

for (m = 0; m < M; m++) {

ky[m]*y[n] +

kz[m]*z[n])

for (m = 0; m < M/32; m++) {

ky[m]*y[n] +

kz[m]*z[n])

for (m = 31M/32; m < 32M/32; m++)

ky[m]*y[n] +

kz[m]*z[n])

Q(float* x,y,z,rQ,iQ,kx,ky,kz,phi,

int startM,endM)

n = blockIdx.x*TPB + threadIdx.x

for (m = startM; m < endM; m++) {

exp = 2*PI*(kx[m]*x[n]

+ ky[m]*y[n]

+ kz[m]*z[n])

rQ[n] += phi[m] * cos(exp)

iQ[n] += phi[m] * sin(exp)

A.N. Netravali and B.G. Haskell, Digital Pictures: Representation, Compression, and Standards (2nd Ed), Plenum Press, New York, NY (1995).

Increase in per-thread performance, but fewer threads:

Lower overall performance 23

30 108X 228X 357X

• Programmers are

doing too much

heavy lifting

• Too many memory

organizational

details are

exposed to the programmers

Sum of Absolute

Differences

S. Ryoo, et al, “Program Optimization Space Pruning for a Multithreaded GPU, ACM

/IEEE CGO, April 2008.

Sum of Absolute

Differences

By selecting only

Pareto-optimal points,

we pruned the search space by 98% and still

found the optimal configuration

S. Ryoo, et al, “Program Optimization Space Pruning for a Multithreaded GPU, ACM

/IEEE CGO, April 2008.

IA multi-core

& Larrabe NVIDIA GPU

NVIDIA

SDK 1.1

MCUDA/

OpenMP

CUDA-lite

CUDA-tune

CUDA-auto

1st generation CUDA programming

with explicit, hardwired thread

organizations and explicit

management of memory types and

data transfers

Parameterized CUDA programming using

auto-tuning and optimization space

pruning

Locality annotation programming to

eliminate need for explicit management of

memory types and data transfers

Implicitly parallel programming with data

structure and function property

annotations to enable auto parallelization

S.S. Stone, et al, “Accelerating Advanced MRI Reconstruction using

GPUs,” ACM Computing Frontier Conference 2008, Italy, May 2008.

10 kernels, less

than 1.5 min

after acceleration

Matrixmul(A[ ], B[ ], C[ ])

__shared__ Asub[ ][ ], Bsub[ ][ ];

int a,b,c;

float Csub;

int k;

for(…)

Asub[tx][ty] = A[a];

Bsub[tx][ty] = B[b];

__syncthreads();

for( k = 0; k < blockDim.x; k++ )

Csub += Asub[ty][k] + Bsub[k][tx];

__syncthreads();

Matrixmul(A[ ], B[ ], C[ ])

__shared__ Asub[ ][ ], Bsub[ ][ ];

int a,b,c;

float Csub;

int k;

for(…)

for(ty=0; ty < blockDim.y; ty++)

for(tx=0; tx < blockDim.x; tx++)

Asub[tx][ty] = A[a];

Bsub[tx][ty] = B[b];

for(ty=0; ty < blockDim.y; ty++)

for(tx=0; tx < blockDim.x; tx++)

for( k = 0; k < blockDim.x; k++ )

Csub += Asub[ty][k] + Bsub[k][tx];

•  Consistent speed-up over hand-tuned single-thread code

•  Best optimizations for GPU and CPU not always the same

*Over hand-optimized CPU

**Intel MKL, multi-core execution

Thank you! Any questions?

HC20.24.250.CUDA Application Development Experience...Compute Q Acquire Data Compute FHd Find ρ...

Documents

Transcript of HC20.24.250.CUDA Application Development Experience...Compute Q Acquire Data Compute FHd Find ρ...

Polio voidaan hävittää! 1985: 350 000 2009: 1606 => vähentynyt 99.5 %

PRADEEP HALDAR...Pradeep Haldar Nov 2018 2 • Created a vision and established several new innovative research/educational training initiatives, consortia, centers and programs by

99.5 Percent Value at Risk Measure over a One-Year Time ...99.5 percent VaR applied to a company means that it wants to be 99.5 percent certain that it has enough assets to pay its

CURRICULUM VITAE ACHINTYA HALDAR, Ph.D., P.E., …civil.arizona.edu/sites/default/files/documents/HALDAR - 2171.pdf · ... Engineers India Ltd., New Delhi, ... Calcutta, India 5/67-7/67

Addendum to 99.5 Presentation - Meetupfiles.meetup.com/1470281/Digital 99.5 - HRDPC... · Three Steps to Better Photos •One) Like a sentence, every photo should have a subject.

Raspberry Pi Compute Module (CM1) Raspberry Pi Compute ...

Hygroscopic behavior of multicomponent organic aerosols and …€¦ · Succinic acid C4H6O4 118.1 1.57 Sigma-Aldrich, 99.5% Phthalic acid C8H6O4 166.1 1.59 Sigma-Aldrich, 99.5% et

The Textile Material required by the Hutchinson Group Debu HALDAR Country Director - INDIA.

AEFI p haldar presentation for sepio meet

London Stock Exchange · 15.11.2013 · EX-99.5 2 a13-23427_1ex99d5.htm CONSOLIDATED FINANCIAL STATEMENTS (VOLUME 1 OF THE PUBLIC ACCOUNTS) Exhibit 99.5 PUBLIC ACCOUNTS 2012-2013

(1))/Pages/Death_Certificate.docx · Web viewSUNIL KUMAR HALDAR SHIBASHIS HALDAR FL-1603,TOWER-7,HEIGHTS,UNIWORLD CITY DR. MAJOR AJIT KUMAR VERMA 10/11/2016 2016/Death/000127 AMIYA

886 89.3 91.1 92 £ 93.7 94.5 95.3 98.5 99.5 1000 .3 102M ...

CHEMICAL ADDITIVES USED IN FRACKING - … · fracking fluid is 99.5% water and sand. the other 0.5% is made up of basic additives. 99.5% water and sand chemical additives frack fluid

Catalogue 2020 part II - Essens Italy · 2020-02-28 · ESSENS ALOE VERA 99.5% GEL DRINK - VITAMIN C ESSENS ALOE VERA 99.5% GEL DRINK - GRAPE • integratore alimentare prodotto con

6 am to 10 am Sundays on 99.5 The Wolf.

Haldar Pronab et al. Int. Res. J. Pharm. 2013, 4 (9)irjponline.com/admin/php/uploads/2015_pdf.pdf · Haldar Pronab et al. Int. Res. J. Pharm. 2013, 4 (9) ... Department of Kayachikitsa,

COMPUTE VISUAL PROFILER - Nvidiadeveloper.download.nvidia.com/compute/cuda/3_2...A group of sessions is called a project. Compute Visual Profiler saves the following files: Compute

Diameter Protocol Overview Maheshwar Haldar PGDipSci Telecommunication TELE 411 Presentation 1.

Quentin Fouliard, Sandip Haldar, Ranajay Ghosh, Seetha ... · Quentin Fouliard, Sandip Haldar, Ranajay Ghosh, Seetha Raghavan OBJECTIVES MATERIALS DOPED LAYER TBC CONFIGURATIONS At

EXPLORE GBH...CRB Classical 99.5 classicalwcrb.org 99.5 Boston-NH 89.7 HD2 Boston 96.3 Beacon Hill 88.7 Providence 90.1 HD2 Cape and Islands CAI, Cape and Islands NPR® Station capeandislands.org