OpenMP Tutorial - Purdue EngineeringECE 563 Programming Parallel Machines 1 OpenMP Tutorial...
Transcript of OpenMP Tutorial - Purdue EngineeringECE 563 Programming Parallel Machines 1 OpenMP Tutorial...
![Page 1: OpenMP Tutorial - Purdue EngineeringECE 563 Programming Parallel Machines 1 OpenMP Tutorial Seung-Jai Min (smin@purdue.edu) School of Electrical and Computer Engineering Purdue University,](https://reader030.fdocuments.us/reader030/viewer/2022040803/5e3fa20170f7443c111f636b/html5/thumbnails/1.jpg)
ECE 563 Programming Parallel Machines 1
OpenMP TutorialSeung-Jai Min
School of Electrical and Computer EngineeringPurdue University, West Lafayette, IN
![Page 2: OpenMP Tutorial - Purdue EngineeringECE 563 Programming Parallel Machines 1 OpenMP Tutorial Seung-Jai Min (smin@purdue.edu) School of Electrical and Computer Engineering Purdue University,](https://reader030.fdocuments.us/reader030/viewer/2022040803/5e3fa20170f7443c111f636b/html5/thumbnails/2.jpg)
ECE 563 Programming Parallel Machines 2
Parallel Programming Standards
• Thread Libraries- Win32 API / Posix threads
• Compiler Directives- OpenMP (Shared memory programming)
• Message Passing Libraries- MPI (Distributed memory programming)
OUR FOCUS
![Page 3: OpenMP Tutorial - Purdue EngineeringECE 563 Programming Parallel Machines 1 OpenMP Tutorial Seung-Jai Min (smin@purdue.edu) School of Electrical and Computer Engineering Purdue University,](https://reader030.fdocuments.us/reader030/viewer/2022040803/5e3fa20170f7443c111f636b/html5/thumbnails/3.jpg)
ECE 563 Programming Parallel Machines 3
Shared Memory ParallelProgramming in the Multi-Core Era
• Desktop and Laptop– 2, 4, 8 cores and … ?
• A single node in distributed memory clusters– Steele cluster node: 2 8 (16) cores
• Shared memory hardware Accelerators• Cell processors: 1 PPE and 8 SPEs• Nvidia Quadro GPUs: 128 processing units
![Page 4: OpenMP Tutorial - Purdue EngineeringECE 563 Programming Parallel Machines 1 OpenMP Tutorial Seung-Jai Min (smin@purdue.edu) School of Electrical and Computer Engineering Purdue University,](https://reader030.fdocuments.us/reader030/viewer/2022040803/5e3fa20170f7443c111f636b/html5/thumbnails/4.jpg)
ECE 563 Programming Parallel Machines 4
OpenMP:Some syntax details to get us started
• Most of the constructs in OpenMP are compilerdirectives or pragmas.– For C and C++, the pragmas take the form:
#pragma omp construct [clause [clause]…]
– For Fortran, the directives take one of the forms:C$OMP construct [clause [clause]…]!$OMP construct [clause [clause]…]*$OMP construct [clause [clause]…]
• Include files#include “omp.h”
![Page 5: OpenMP Tutorial - Purdue EngineeringECE 563 Programming Parallel Machines 1 OpenMP Tutorial Seung-Jai Min (smin@purdue.edu) School of Electrical and Computer Engineering Purdue University,](https://reader030.fdocuments.us/reader030/viewer/2022040803/5e3fa20170f7443c111f636b/html5/thumbnails/5.jpg)
ECE 563 Programming Parallel Machines 5
How is OpenMP typically used?• OpenMP is usually used to parallelize loops:
• Find your most time consuming loops.• Split them up between threads.
void main(){ int i, k, N=1000; double A[N], B[N], C[N]; for (i=0; i<N; i++) { A[i] = B[i] + k*C[i] }}
Sequential Program#include “omp.h”void main(){ int i, k, N=1000; double A[N], B[N], C[N];#pragma omp parallel for for (i=0; i<N; i++) { A[i] = B[i] + k*C[i]; }}
Parallel Program
![Page 6: OpenMP Tutorial - Purdue EngineeringECE 563 Programming Parallel Machines 1 OpenMP Tutorial Seung-Jai Min (smin@purdue.edu) School of Electrical and Computer Engineering Purdue University,](https://reader030.fdocuments.us/reader030/viewer/2022040803/5e3fa20170f7443c111f636b/html5/thumbnails/6.jpg)
ECE 563 Programming Parallel Machines 6
How is OpenMP typically used?
• Single Program Multiple Data (SPMD)(Cont.)
#include “omp.h”void main(){ int i, k, N=1000; double A[N], B[N], C[N];#pragma omp parallel for for (i=0; i<N; i++) { A[i] = B[i] + k*C[i]; }}
Parallel Program
![Page 7: OpenMP Tutorial - Purdue EngineeringECE 563 Programming Parallel Machines 1 OpenMP Tutorial Seung-Jai Min (smin@purdue.edu) School of Electrical and Computer Engineering Purdue University,](https://reader030.fdocuments.us/reader030/viewer/2022040803/5e3fa20170f7443c111f636b/html5/thumbnails/7.jpg)
ECE 563 Programming Parallel Machines 7
How is OpenMP typically used?
• Single Program Multiple Data (SPMD)(Cont.)
Thread 0
void main(){ int i, k, N=1000; double A[N], B[N], C[N]; lb = 0; ub = 250; for (i=lb;i<ub;i++) { A[i] = B[i] + k*C[i]; }}
Thread 1
void main(){ int i, k, N=1000; double A[N], B[N], C[N]; lb = 250; ub = 500; for (i=lb;i<ub;i++) { A[i] = B[i] + k*C[i]; }}
Thread 2
void main(){ int i, k, N=1000; double A[N], B[N], C[N]; lb = 500; ub = 750; for (i=lb;i<ub;i++) { A[i] = B[i] + k*C[i]; }}
Thread 3
void main(){ int i, k, N=1000; double A[N], B[N], C[N]; lb = 750; ub = 1000; for (i=lb;i<ub;i++) { A[i] = B[i] + k*C[i]; }}
![Page 8: OpenMP Tutorial - Purdue EngineeringECE 563 Programming Parallel Machines 1 OpenMP Tutorial Seung-Jai Min (smin@purdue.edu) School of Electrical and Computer Engineering Purdue University,](https://reader030.fdocuments.us/reader030/viewer/2022040803/5e3fa20170f7443c111f636b/html5/thumbnails/8.jpg)
ECE 563 Programming Parallel Machines 8
OpenMP Fork-and-Join model
printf(“program begin\n”);N = 1000;
#pragma omp parallel forfor (i=0; i<N; i++) A[i] = B[i] + C[i];
M = 500;
#pragma omp parallel forfor (j=0; j<M; j++) p[j] = q[j] – r[j];
printf(“program done\n”); Serial
Serial
Parallel
Serial
Parallel
![Page 9: OpenMP Tutorial - Purdue EngineeringECE 563 Programming Parallel Machines 1 OpenMP Tutorial Seung-Jai Min (smin@purdue.edu) School of Electrical and Computer Engineering Purdue University,](https://reader030.fdocuments.us/reader030/viewer/2022040803/5e3fa20170f7443c111f636b/html5/thumbnails/9.jpg)
ECE 563 Programming Parallel Machines 9
OpenMP Constructs
1. Parallel Regions– #pragma omp parallel
2. Worksharing– #pragma omp for, #pragma omp sections
3. Data Environment– #pragma omp parallel shared/private (…)
4. Synchronization– #pragma omp barrier
5. Runtime functions/environment variables– int my_thread_id = omp_get_num_threads();– omp_set_num_threads(8);
![Page 10: OpenMP Tutorial - Purdue EngineeringECE 563 Programming Parallel Machines 1 OpenMP Tutorial Seung-Jai Min (smin@purdue.edu) School of Electrical and Computer Engineering Purdue University,](https://reader030.fdocuments.us/reader030/viewer/2022040803/5e3fa20170f7443c111f636b/html5/thumbnails/10.jpg)
ECE 563 Programming Parallel Machines 10
• Most OpenMP constructs apply to structuredblocks.– Structured block: one point of entry at the top and
one point of exit at the bottom.– The only “branches” allowed are STOP statements in
Fortran and exit() in C/C++.
OpenMP: Structured blocks
![Page 11: OpenMP Tutorial - Purdue EngineeringECE 563 Programming Parallel Machines 1 OpenMP Tutorial Seung-Jai Min (smin@purdue.edu) School of Electrical and Computer Engineering Purdue University,](https://reader030.fdocuments.us/reader030/viewer/2022040803/5e3fa20170f7443c111f636b/html5/thumbnails/11.jpg)
ECE 563 Programming Parallel Machines 11
A Structured Block Not A Structured Block
if(count==1) goto more;#pragma omp parallel{more: do_big_job(id); if(++count>1) goto done;}done: if(!really_done()) goto more;
#pragma omp parallel{more: do_big_job(id); if(++count>1) goto more;} printf(“ All done \n”);
OpenMP: Structured blocks
![Page 12: OpenMP Tutorial - Purdue EngineeringECE 563 Programming Parallel Machines 1 OpenMP Tutorial Seung-Jai Min (smin@purdue.edu) School of Electrical and Computer Engineering Purdue University,](https://reader030.fdocuments.us/reader030/viewer/2022040803/5e3fa20170f7443c111f636b/html5/thumbnails/12.jpg)
ECE 563 Programming Parallel Machines 12
Structured Block Boundaries
• In C/C++: a block is a single statement or a group ofstatements between brackets {}
#pragma omp for
for (I=0;I<N;I++) {
res[I] = big_calc(I);
A[I] = B[I] + res[I];
}
#pragma omp parallel
{
id = omp_thread_num();
A[id] = big_compute(id);
}
![Page 13: OpenMP Tutorial - Purdue EngineeringECE 563 Programming Parallel Machines 1 OpenMP Tutorial Seung-Jai Min (smin@purdue.edu) School of Electrical and Computer Engineering Purdue University,](https://reader030.fdocuments.us/reader030/viewer/2022040803/5e3fa20170f7443c111f636b/html5/thumbnails/13.jpg)
ECE 563 Programming Parallel Machines 13
Structured Block Boundaries
• In Fortran: a block is a single statement or a group ofstatements between directive/end-directive pairs.
C$OMP PARALLEL
10 W(id) = garbage(id)
res(id) = W(id)**2
if(res(id) goto 10
C$OMP END PARALLEL
C$OMP PARALLEL DO
do I=1,N
res(I)=bigComp(I)
end do
C$OMP END PARALLEL DO
![Page 14: OpenMP Tutorial - Purdue EngineeringECE 563 Programming Parallel Machines 1 OpenMP Tutorial Seung-Jai Min (smin@purdue.edu) School of Electrical and Computer Engineering Purdue University,](https://reader030.fdocuments.us/reader030/viewer/2022040803/5e3fa20170f7443c111f636b/html5/thumbnails/14.jpg)
ECE 563 Programming Parallel Machines 14
OpenMP Parallel Regionsdouble A[1000];omp_set_num_threads(4);#pragma omp parallel{
int ID = omp_get_thread_num(); pooh(ID, A);}printf(“all done\n”);omp_set_num_threads(4)
pooh(1,A) pooh(2,A) pooh(3,A)
printf(“all done\n”);
pooh(0,A)
double A[1000];
A singlecopy of “A”is sharedbetween allthreads.
Implicit barrier: threads wait here forall threads to finish before proceeding
![Page 15: OpenMP Tutorial - Purdue EngineeringECE 563 Programming Parallel Machines 1 OpenMP Tutorial Seung-Jai Min (smin@purdue.edu) School of Electrical and Computer Engineering Purdue University,](https://reader030.fdocuments.us/reader030/viewer/2022040803/5e3fa20170f7443c111f636b/html5/thumbnails/15.jpg)
ECE 563 Programming Parallel Machines 15
The OpenMP APICombined parallel work-share• OpenMP shortcut: Put the “parallel” and the
work-share on the same line
int i;double res[MAX];#pragma omp parallel{ #pragma omp for for (i=0;i< MAX; i++) { res[i] = huge(); }}
int i;double res[MAX];#pragma omp parallel forfor (i=0;i< MAX; i++) { res[i] = huge();}
the same OpenMP
![Page 16: OpenMP Tutorial - Purdue EngineeringECE 563 Programming Parallel Machines 1 OpenMP Tutorial Seung-Jai Min (smin@purdue.edu) School of Electrical and Computer Engineering Purdue University,](https://reader030.fdocuments.us/reader030/viewer/2022040803/5e3fa20170f7443c111f636b/html5/thumbnails/16.jpg)
ECE 563 Programming Parallel Machines 16
Shared Memory Model
• Data can be shared orprivate
• Shared data is accessibleby all threads
• Private data can beaccessed only by thethread that owns it
• Data transfer is transparentto the programmer
SharedMemory
thread1thread2
thread3
thread4
thread5
private private
private
private
private
![Page 17: OpenMP Tutorial - Purdue EngineeringECE 563 Programming Parallel Machines 1 OpenMP Tutorial Seung-Jai Min (smin@purdue.edu) School of Electrical and Computer Engineering Purdue University,](https://reader030.fdocuments.us/reader030/viewer/2022040803/5e3fa20170f7443c111f636b/html5/thumbnails/17.jpg)
ECE 563 Programming Parallel Machines 17
Data Environment:Default storage attributes
• Shared Memory programming model– Variables are shared by default
• Distributed Memory Programming Model– All variables are private
![Page 18: OpenMP Tutorial - Purdue EngineeringECE 563 Programming Parallel Machines 1 OpenMP Tutorial Seung-Jai Min (smin@purdue.edu) School of Electrical and Computer Engineering Purdue University,](https://reader030.fdocuments.us/reader030/viewer/2022040803/5e3fa20170f7443c111f636b/html5/thumbnails/18.jpg)
ECE 563 Programming Parallel Machines 18
Data Environment:Default storage attributes
• Global variables are SHARED among threads
– Fortran: COMMON blocks, SAVE variables, MODULE variables
– C: File scope variables, static
• But not everything is shared...
– Stack variables in sub-programs called from parallel regionsare PRIVATE
– Automatic variables within a statement block are PRIVATE.
![Page 19: OpenMP Tutorial - Purdue EngineeringECE 563 Programming Parallel Machines 1 OpenMP Tutorial Seung-Jai Min (smin@purdue.edu) School of Electrical and Computer Engineering Purdue University,](https://reader030.fdocuments.us/reader030/viewer/2022040803/5e3fa20170f7443c111f636b/html5/thumbnails/19.jpg)
ECE 563 Programming Parallel Machines 19
Data Environmentint A[100]; /* (Global) SHARED */
int main()
{
int ii, jj; /* PRIVATE */
int B[100]; /* SHARED */
#pragma omp parallel private(jj)
{
int kk = 1; /* PRIVATE */
#pragma omp for
for (ii=0; ii<N; ii++)
for (jj=0; jj<N; jj++)
A[ii][jj] = foo(B[ii][jj]);
}
}
int foo(int x)
{
/* PRIVATE */
int count=0;
return x*count;
}
![Page 20: OpenMP Tutorial - Purdue EngineeringECE 563 Programming Parallel Machines 1 OpenMP Tutorial Seung-Jai Min (smin@purdue.edu) School of Electrical and Computer Engineering Purdue University,](https://reader030.fdocuments.us/reader030/viewer/2022040803/5e3fa20170f7443c111f636b/html5/thumbnails/20.jpg)
ECE 563 Programming Parallel Machines 20
Work Sharing Construct
#pragma omp for [clause[[,] clause …] new-line
for-loops
Where clause is one of the following: private / firstprivate / lastprivate(list) reduction(operator: list) schedule(kind[, chunk_size]) collapse(n) ordered nowait
Loop Construct
![Page 21: OpenMP Tutorial - Purdue EngineeringECE 563 Programming Parallel Machines 1 OpenMP Tutorial Seung-Jai Min (smin@purdue.edu) School of Electrical and Computer Engineering Purdue University,](https://reader030.fdocuments.us/reader030/viewer/2022040803/5e3fa20170f7443c111f636b/html5/thumbnails/21.jpg)
ECE 563 Programming Parallel Machines 21
Schedulefor (i=0; i<1100; i++)
A[i] = … ;
250 250 250
#pragma omp parallel for schedule (static, 250) or (static)
200
#pragma omp parallel for schedule (dynamic, 200)
p0 p1 p2 p3
200 200 200 200p3 p0 p2 p3 p1 p0
100
137
#pragma omp parallel for schedule (guided, 100)
120 105p0 p3 p0 p1
#pragma omp parallel for schedule (auto)
250 100p0
275 275 275p0 p1 p2 p3
275or
p2 p3 p0100 100 100 100 100 100 100 38
p1 p2 p3 p0
![Page 22: OpenMP Tutorial - Purdue EngineeringECE 563 Programming Parallel Machines 1 OpenMP Tutorial Seung-Jai Min (smin@purdue.edu) School of Electrical and Computer Engineering Purdue University,](https://reader030.fdocuments.us/reader030/viewer/2022040803/5e3fa20170f7443c111f636b/html5/thumbnails/22.jpg)
ECE 563 Programming Parallel Machines 22
Critical Construct
sum = 0;
#pragma omp parallel private (lsum)
{
lsum = 0;
#pragma omp for
for (i=0; i<N; i++) {
lsum = lsum + A[i];
}
#pragma omp critical
{ sum += lsum; }
}Threads wait their turn; only one thread at a time executes the critical section
![Page 23: OpenMP Tutorial - Purdue EngineeringECE 563 Programming Parallel Machines 1 OpenMP Tutorial Seung-Jai Min (smin@purdue.edu) School of Electrical and Computer Engineering Purdue University,](https://reader030.fdocuments.us/reader030/viewer/2022040803/5e3fa20170f7443c111f636b/html5/thumbnails/23.jpg)
ECE 563 Programming Parallel Machines 23
Reduction Clause
sum = 0;
#pragma omp parallel for reduction (+:sum)
for (i=0; i<N; i++)
{
sum = sum + A[i];
}
Shared variable
![Page 24: OpenMP Tutorial - Purdue EngineeringECE 563 Programming Parallel Machines 1 OpenMP Tutorial Seung-Jai Min (smin@purdue.edu) School of Electrical and Computer Engineering Purdue University,](https://reader030.fdocuments.us/reader030/viewer/2022040803/5e3fa20170f7443c111f636b/html5/thumbnails/24.jpg)
ECE 563 Programming Parallel Machines 24
Performance Evaluation
• How do we measure performance? (orhow do we remove noise?)
#define N 24000
For (k=0; k<10; k++)
{
#pragma omp parallel for private(i, j)
for (i=1; i<N-1; i++)
for (j=1; j<N-1; j++)
a[i][j] = (b[i][j-1]+b[i][j+1])/2.0;
}
![Page 25: OpenMP Tutorial - Purdue EngineeringECE 563 Programming Parallel Machines 1 OpenMP Tutorial Seung-Jai Min (smin@purdue.edu) School of Electrical and Computer Engineering Purdue University,](https://reader030.fdocuments.us/reader030/viewer/2022040803/5e3fa20170f7443c111f636b/html5/thumbnails/25.jpg)
ECE 563 Programming Parallel Machines 25
Performance Issues
• What if you see a speedup saturation?
#define N 12000
#pragma omp parallel for private(j)
for (i=1; i<N-1; i++)
for (j=1; j<N-1; j++)
a[i][j] = (b[i][j-1]+b[i][j]+b[i][j+1]
b[i-1][j]+b[i+1][j])/5.0;
2 4 6 81# CPUs
Speedup
![Page 26: OpenMP Tutorial - Purdue EngineeringECE 563 Programming Parallel Machines 1 OpenMP Tutorial Seung-Jai Min (smin@purdue.edu) School of Electrical and Computer Engineering Purdue University,](https://reader030.fdocuments.us/reader030/viewer/2022040803/5e3fa20170f7443c111f636b/html5/thumbnails/26.jpg)
ECE 563 Programming Parallel Machines 26
Performance Issues
• What if you see a speedup saturation?
#define N 12000
#pragma omp parallel for private(j)
for (i=1; i<N-1; i++)
for (j=1; j<N-1; j++)
a[i][j] = b[i][j];
2 4 6 81# CPUs
Speedup
![Page 27: OpenMP Tutorial - Purdue EngineeringECE 563 Programming Parallel Machines 1 OpenMP Tutorial Seung-Jai Min (smin@purdue.edu) School of Electrical and Computer Engineering Purdue University,](https://reader030.fdocuments.us/reader030/viewer/2022040803/5e3fa20170f7443c111f636b/html5/thumbnails/27.jpg)
ECE 563 Programming Parallel Machines 27
Loop Scheduling
• Any guideline for a chunk size?
#define N <big-number>
chunk = ???;
#pragma omp parallel for schedule (static, chunk)
for (i=1; i<N-1; i++)
a[i][j] = ( b[i-2] + b[i-1] + b[i]
b[i+1] + b[i+2] )/5.0;
![Page 28: OpenMP Tutorial - Purdue EngineeringECE 563 Programming Parallel Machines 1 OpenMP Tutorial Seung-Jai Min (smin@purdue.edu) School of Electrical and Computer Engineering Purdue University,](https://reader030.fdocuments.us/reader030/viewer/2022040803/5e3fa20170f7443c111f636b/html5/thumbnails/28.jpg)
ECE 563 Programming Parallel Machines 28
Performance Issues
• Load imbalance: triangular access pattern
#define N 12000
#pragma omp parallel for private(j)
for (i=1; i<N-1; i++)
for (j=i; j<N-1; j++)
a[i][j] = (b[i][j-1]+b[i][j]+b[i][j+1]
b[i-1][j]+b[i+1][j])/5.0;
![Page 29: OpenMP Tutorial - Purdue EngineeringECE 563 Programming Parallel Machines 1 OpenMP Tutorial Seung-Jai Min (smin@purdue.edu) School of Electrical and Computer Engineering Purdue University,](https://reader030.fdocuments.us/reader030/viewer/2022040803/5e3fa20170f7443c111f636b/html5/thumbnails/29.jpg)
ECE 563 Programming Parallel Machines 29
Summary
• OpenMP has advantages– Incremental parallelization– Compared to MPI
• No data partitioning• No communication scheduling
![Page 30: OpenMP Tutorial - Purdue EngineeringECE 563 Programming Parallel Machines 1 OpenMP Tutorial Seung-Jai Min (smin@purdue.edu) School of Electrical and Computer Engineering Purdue University,](https://reader030.fdocuments.us/reader030/viewer/2022040803/5e3fa20170f7443c111f636b/html5/thumbnails/30.jpg)
ECE 563 Programming Parallel Machines 30
Resources
http://www.openmp.orghttp://openmp.org/wp/resources