OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP...
Transcript of OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP...
![Page 1: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/1.jpg)
1
OpenMP
![Page 2: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/2.jpg)
2
Shared Memory Architektur
Processor
BUS
Memory
Processor Processor Processor
![Page 3: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/3.jpg)
3
OpenMP
• Portable programming of shared memory systems.
• It is a quasi-standard.
• OpenMP-Forum
• Started in 1997
• Current standard OpenMP 4.0 from July 2013
• API for Fortran and C/C++
• directives
• runtime routines
• environment variables
• www.openmp.org
![Page 4: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/4.jpg)
4
> export OMP_NUM_THREADS=3
> a.out
Hello World
Hello World
Hello World
> export OMP_NUM_THREADS=2
> a.out
Hello world
Hello world
> icc –O3 –openmp openmp.c
#include <omp.h>
main(){
#pragma omp parallel
{
printf(“Hello world”);
}
}
Example
Program
Compilation
Execution
![Page 5: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/5.jpg)
5
#pragma omp parallel
{
printf(“Hello world %d\n”, omp_get_thread_num());
}
PARALLEL {
Execution Model
print print print
}
T0
T0 T1 T2
T0
Thread
Team
Creates
Team is destroyed
![Page 6: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/6.jpg)
6
Fork/Join Execution Model
1. An OpenMP-program starts as a single thread (master
thread).
2. Additional threads (Team) are created when the master hits
a parallel region.
3. When all threads finished the parallel region, the new
threads are given back to the runtime or operating system.
• A team consists of a fixed set of threads executing
the parallel region redundantly.
• All threads in the team are synchronized at the end
of a parallel region via a barrier.
• The master continues after the parallel region.
![Page 7: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/7.jpg)
7
Work Sharing in a Parallel Region
main (){
int a[100];
#pragma omp parallel
{
#pragma omp for
for (int i= 1; i<n;i++)
a(i) = i;
…
}
}
![Page 8: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/8.jpg)
8
Shared and Private Data
• Shared data are accessible by all threads. A reference
a[5] to a shared array accesses the same address in
all threads.
• Private data are accessible only by a single thread.
Each thread has its own copy.
• The default is shared.
![Page 9: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/9.jpg)
9
Private clause for parallel loop
main (){
int a[100], t;
#pragma omp parallel
{
#pragma omp for private(t)
for (int i= 1; i<n;i++){
t=f(i);
a(i)=t;
}
}
}
![Page 10: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/10.jpg)
10
Example: Private Data
I=3
#pragma omp parallel private(i)
{
I=17
}
Printf(“Value of I=%d\n”, I);
I = 3 I = 3
I1 = 17
I2 = 17
I3 = 17
I = 3
I = 3 I = 17
I1 = 17
I2 = 17
I = 17
![Page 11: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/11.jpg)
11
Example
main (){
int iam, nthreads;
#pragma omp parallel private(iam,nthreads)
{
iam = omp_get_thread_num();
nthreads = omp_get_num_threads();
printf(“ThradID %d, out of %d threads\n”, iam, nthreads);
if (iam == 0) ! Different control flow
printf(“Here is the Master Thread.\n”);
else
printf(“Here is another thread.\n”);
}
}
![Page 12: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/12.jpg)
12
Private Data
• A new copy is created for each thread.
• One thread may reuse the global shared copy.
• The private copies are destroyed after the parallel
region.
• The value of the shared copy is undefined.
![Page 13: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/13.jpg)
13
Example: Shared Data
I=77
#pragma omp parallel shared(i)
{
I=omp_get_thread_num();
}
Printf(“Value of I=%d\n”, I);
I = 77 I = 2 I = 1 I = 0I = 0
In Parallel Region
I = 77 I = 0 I = 1 I = 2I = 2I = 77 I = 3 I = 2 I = 1I = 1
![Page 14: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/14.jpg)
14
int main() {
#pragma omp parallel default(shared)
{
printf(”hello world\n” );
}
}
!$OMP PARALLEL DEFAULT(SHARED)
write(*,*) ´Hello world´
!$OMP END PARALLEL
Syntax of Directives and Pragmas
Fortran
!$OMP directive name [parameters]
C / C++
#pragma omp directive name [parameters]
![Page 15: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/15.jpg)
15
Directives
Directives can have continuation lines• Fortran
!$OMP directive name first_part &
!$OMP continuation_part• C
#pragma omp parallel private(i) \
private(j)
![Page 16: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/16.jpg)
16
#pragma omp parallel [parameters]
{
parallel region
}
Parallel Region
• The statements enclosed lexically within a region
define the lexical extent of the region.
• The dynamic extent further includes the routines
called from within the construct.
![Page 17: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/17.jpg)
17
Lexical and Dynamic Extend
main (){
int a[100];
#pragma omp parallel
{
…
}
}
sub(int a[])
{
#pragma omp for
for (int i= 1; i<n;i++)
a(i) = i;
}
• Local variables of a subroutine called in a parallel region are
by default private.
![Page 18: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/18.jpg)
18
Work-Sharing Constructs
• Work-sharing constructs distribute the specified work
to all threads within the current team.
• Types
• Parallel loop
• Parallel section
• Master region
• Single region
• General work-sharing construct (only Fortran)
![Page 19: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/19.jpg)
19
#pragma omp for [parameters]
for ...
Parallel Loop
• The iterations of the do-loop are distributed to the threads.
• The scheduling of loop iterations is determined by one of the
scheduling strategies static, dynamic, guided, and runtime.
• There is no synchronization at the beginning.
• All threads of the team synchronize at an implicit barrier if the
parameter nowait is not specified.
• The loop variable is by default private. It must not be modified in
the loop body.
• The expressions in the for-statement are very restricted.
![Page 20: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/20.jpg)
20
Scheduling Strategies
• Schedule clause
schedule (type [,size])
• Scheduling types:
• static: Chunks of the specified size are assigned in a round-
robin fashion to the threads.
• dynamic: The iterations are broken into chunks of the
specified size. When a thread finishes the execution of a
chunk, the next chunk is assigned to that thread.
• guided: Similar to dynamic, but the size of the chunks is
exponentially decreasing. The size parameter specifies the
smallest chunk. The initial chunk is implementation
dependent.
• runtime: The scheduling type and the chunk size is
determined via environment variables.
![Page 21: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/21.jpg)
21
Example: Dynamic Scheduling
main(){
int i, a[1000];
#pragma omp parallel
{
#pragma omp for schedule(dynamic, 4)
for (int i=0; i<1000;i++)
a[i] = omp_get_thread_num();
#pragma omp for schedule(guided)
for (int i=0; i<1000;i++)
a[i] = omp_get_thread_num();
}
}
![Page 22: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/22.jpg)
22
Reductions
• This clause performs a reduction on the variables that
appear in list, with the operator operator.
• Variables must be shared scalars
• operator is one of the following:
• +, *, -, &, ˆ, |, &&, ||
• Reduction variable might only appear in statements
with the following form:
• x = x operator expr
• x binop= expr
• x++, ++x, x--, --x
reduction(operator: list)
![Page 23: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/23.jpg)
23
Example: Reduction
#pragma omp parallel for reduction(+: a)
for (i=0; i<n; i++) {
a = a + b[i];
}
![Page 24: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/24.jpg)
24
Classification of Variables
• private(var-list)
• Variables in var-list are private.
• shared(var-list)
• Variables in var-list are shared.
• default(private | shared | none)
• Sets the default for all variables in this region.
• firstprivate(var-list)
• Variables are private and are initialized with the value of the
shared copy before the region.
• lastprivate(var-list)
• Variables are private and the value of the thread executing the
last iteration of a parallel loop in sequential order is copied to
the variable outside of the region.
![Page 25: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/25.jpg)
25
Scoping Variables with Private Clause
• The values of the shared copies of i and j are undefined on exit
from the parallel region.
• The private copies of j are initialized in the parallel region to 2.
int i, j;
i = 1;
j = 2;
#pragma omp parallel private(i) firstprivate(j)
{
i = 3;
j = j + 2;
printf("%d %d\n", i, j);
}
![Page 26: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/26.jpg)
26
Parallel Section
• Each section of a parallel section is executed once by
one thread of the team.
• Threads that finished their section wait at the implicit
barrier at the end of the section construct.
#pragma omp sections [parameters]
{
[#pragma omp section]
block
[#pragma omp section
block ]
}
![Page 27: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/27.jpg)
27
Example: Parallel Section
main(){
int i, a[1000], b[1000]
#pragma omp parallel private(i)
{
#pragma omp sections
{
#pragma omp section
for (int i=0; i<1000; i++)
a[i] = 100;
#pragma omp section
for (int i=0; i<1000; i++)
b[i] = 200;
}
}
}
![Page 28: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/28.jpg)
28
OMP Workshare (Fortran only)
• The WORKSHARE directive divides the work of
executing the enclosed code into separate units of
work and distributes the units amongst the threads.
• An implementation of the WORKSHARE directive
must insert any synchronization that is required to
maintain standard Fortran semantics.
• There is an implicit barrier at the end of the workshare
region.
!$OMP WORKSHARE [parameters]
block
!$OMP END WORKSHARE [NOWAIT]
![Page 29: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/29.jpg)
29
Sharing Work in a Fortran 90 Array Statement
A(1:N)=B(2:N+1)+C(1:N)
• Each evaluation of an array expression for an
individual index is a unit of work.
• The assignment to an individual array element is also
a unit of work.
![Page 30: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/30.jpg)
30
Master / Single Region
• A master or single region enforces that only a single thread executes the enclosed code within a parallel region.
• Common• No synchronization at the beginning of region.
• Different• Master region is executed by master thread while the single
region can be executed by any thread.
• Master region is skipped by other threads while all threads are synchronized at the end of a single region.
#pragma omp master
block
#pragma omp single [parameters]
block
![Page 31: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/31.jpg)
31
Combined Work-Sharing and Parallel Constructs
• #pragma omp parallel for
• #pragma omp parallel sections
• !$OMP PARALLEL WORKSHARE
![Page 32: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/32.jpg)
32
#pragma omp barrier
Barrier
• The barrier synchronizes all the threads in a team.
• When encountered, each thread waits until all of the other threads in that
team have reached this point.
![Page 33: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/33.jpg)
33
#pragma omp critical [(Name)]
{ ... }
Critical Section
• Mutual exclusion
• A critical section is a block of code that can be executed by only one
thread at a time.
• Critical section name
• A thread waits at the beginning of a critical section until no other
thread is executing a critical section with the same name.
• All unnamed critical directives map to the same name.
• Critical section names are global entities of the program. If a name
conflicts with any other entity, the behavior of the program is
unspecified.
• Avoid long critical sections
![Page 34: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/34.jpg)
34
#pragma omp parallel private(i)
{
#pragma omp sections
{
#pragma omp section
{
for (int i=0;i<N;i++)
ia = ia + a[i];
#pragma omp critical (c1)
{
itotal = itotal + ia;
}}
#pragma omp section
{
for (int i=0;i<N;i++)
ib = ib + b[i]
#pragma omp critical (c1)
{
itotal = itotal + ib;
}}
}}
Example: Critical Section
main(){
int ia = 0
int ib = 0
int itotal = 0
for (int i=0;i<N;i++)
{
a[i] = i;
b[i] = N-i;
}
}
![Page 35: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/35.jpg)
35
#pragma ATOMIC
expression-stmt
Atomic Statements
• The ATOMIC directive ensures that a specific memory
location is updated atomically
• Has to have the following form:
–x binop= expr
–x++ or ++x
–x-- or -- x
• where x is an lvalue expression with scalar type and expr
does not reference the object designated by x.
• All parallel assignments to the location must be
protected with the atomic directive.
![Page 36: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/36.jpg)
36
Translation of Atomic
#pragma omp atomic
x += expr
can be rewritten as
xtmp = expr
!$OMP CRITICAL (name)
x = x + xtmp
!$OMP END CRITICAL (name)
•Only the load and store of x are protected.
![Page 37: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/37.jpg)
37
Simple Locks
• Locks can be hold by only one thread at a time.
• A lock is represented by a lock variable of type
omp_lock_t.
• The thread that obtained a simple lock cannot set it
again.
• Operations
• omp_init_lock(&lockvar): initialize a lock
• omp_destroy_lock(&lockvar): destroy a lock
• omp_set_lock(&lockvar): set lock
• omp_unset_lock(&lockvar): free lock
• logicalvar = omp_test_lock(&lockvar): check lock and possibly
set lock, returns true if lock was set by the executing thread.
![Page 38: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/38.jpg)
38
Example: Simple Lock
#include <omp.h>
int id;
omp_lock_t lock;
omp_init_lock(lock);
#pragma omp parallel shared(lock) private(id)
{
id = omp_get_thread_num();
omp_set_lock(&lock); //Only a single thread writes
printf(“My Thread num is: %d”, id);
omp_unset_lock(&lock);
WHILE (!omp_test_lock(&lock))
other_work(id); //Lock not obtained
real_work(id); //Lock obtained
omp_unset_lock(&lock);//Lock freed
}
omp_destroy_lock(&lock);
locked
locked
![Page 39: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/39.jpg)
39
Nestable Locks
• Unlike simple locks, nestable locks can be set multiple
times by a single thread.
• Each set operation increments a lock counter.
• Each unset operation decrements the lock counter.
• If the lock counter is 0 after an unset operation, the
lock can be set by another thread.
![Page 40: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/40.jpg)
40
Ordered Construct
• Construct must be within the dynamic extent of an
omp for construct with an ordered clause.
• Ordered constructs are executed strictly in the order in
which they would be executed in a sequential
execution of the loop.
#pragma omp for ordered
for (...)
{ ...
#pragma omp ordered
{ ... }
...
}
![Page 41: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/41.jpg)
41
Example with ordered clause
#pragma omp for ordered
for (...)
{ S1
#pragma omp ordered
{ S2}
S3
} i=1 i=2 i=3 i=N
S1 S1 S1
S2S2
S2
S2
S3 S3
S3
S3
S1
Barrier
![Page 42: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/42.jpg)
42
Flush
• The flush directive synchronizes copies in register or cache of the
executing thread with main memory.
• It synchronizes those variable in the given list or, if no list is
specified, all shared variables accessible in the region.
• It does not update implicit copies at other threads.
• Load/stores executed before the flush in program order have to
be finished.
• Load/stores following the flush in program order are not allowed
to be executed before the flush.
• A flush is executed implicitly for some constructs, e.g. begin and
end of a parallel region, end of work-sharing constructs ...
#pragma omp flush [(list)]
![Page 43: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/43.jpg)
43
Example: Flush
#define MAXTHREAD 100
int iam, neigh, isync[MAXTHREAD+1];
isync[0] = 1;isync[1..MAXTHREAD]=0;
#pragma omp parallel private(iam, neigh)
{
iam = omp_get_thread_num()+1;
neigh = iam – 1;
//Wait for neighbor
while (isync[neigh] == 0) {
#pragma omp flush(isync)
}
//Do my work
work();
isync[iam] = 1; //I am done
#pragma omp flush(isync)
}
1
0
0
0
isync
0
![Page 44: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/44.jpg)
44
Lastprivate example
k=0
#pragma omp parallel
{
#pragma omp for lastprivate(k)
for (i=0; i<100; i++)
a[i] = b[i] + b[i+1];
k=2*i;
}
// The value of k is 198
![Page 45: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/45.jpg)
45
Copyprivate Example
• Copyprivate
• Clause only for single region.
• Variables must be private in enclosing parallel region.
• Value of executing thread is copied to all other threads.
#pragma omp parallel private(x)
{
#pragma omp single copyprivate(x)
{
getValue(x);
}
useValue(x);
}
![Page 46: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/46.jpg)
46
Other Copyprivate Example
float read_next( ) {
float * tmp;
float return_val;
#pragma omp single copyprivate(tmp)
{
tmp = (float *) malloc(sizeof(float));
}
#pragma omp master
{
get_float( tmp );
}
#pragma omp barrier
return_val = *tmp;
#pragma omp barrier
#pragma omp single
{
free(tmp);
}
return return_val;
}
![Page 47: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/47.jpg)
47
Runtime Routines for Threads (1)
• Determine the number of threads for parallel regions
• omp_set_num_threads(count)
• Query the maximum number of threads for team
creation
• numthreads = omp_get_max_threads()
• Query number of threads in the current team
• numthreads = omp_get_num_threads()
• Query own thread number (0..n-1)
• iam = omp_get_thread_num()
• Query number of processors
• numprocs = omp_get_num_procs()
![Page 48: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/48.jpg)
48
Runtime Routines for Threads (2)
• Query state
logicalvar = omp_in_parallel()
• Allow runtime system to determine the number of
threads for team creation
omp_set_dynamic(logicalexpr)
• Query whether runtime system can determine the
number of threads
logicalvar= omp_get_dynamic()
• Allow nesting of parallel regions
omp_set_nested(logicalexpr)
• Query nesting of parallel regions
logicalvar= omp_get_nested()
![Page 49: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/49.jpg)
49
Environment Variables
• OMP_NUM_THREADS=4
• Number of threads in a team of a parallel region
• OMP_SCHEDULE=”dynamic”
OMP_SCHEDULE=”GUIDED,4“
• Selects scheduling strategy to be applied at runtime
• OMP_DYNAMIC=TRUE
• Allow runtime system to determine the number of threads.
• OMP_NESTED=TRUE
• Allow nesting of parallel regions.
![Page 50: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/50.jpg)
50
OpenMP 3.0
• Introduced May 2008
• OpenMP 3.1, July 2011
![Page 51: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/51.jpg)
51
Explicit Tasking
• Explicit creation of tasks#pragma omp parallel
{
#pragma omp single {
for ( elem = l->first; elem; elem = elem->next)
#pragma omp task
process(elem)
}
// all tasks are complete by this point
}
• Task scheduling
• Tasks can be executed by any thread in the team
• Barrier
• All tasks created in the parallel region have to be finished.
![Page 52: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/52.jpg)
52
#pragma omp Task [clause list]
{ ... }
Tasks
Clauses
• If (scalar-expression)• FALSE: Execution starts immediately by the creating thread
• The suspended task may not be resumed until the new task is finished.
• Untied• Task is not tied to the thread starting its execution. It might be rescheduled to another
thread.
• Default (shared|none), private, firstprivate, shared
• If no default clause is present, the implicit data-sharing attribute is firstprivate.
Binding
• The binding thread set of the task region is the current team.
• A task region binds to the innermost enclosing parallel region.
![Page 53: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/53.jpg)
53
Example: Tree Traversal
struct node {
struct node *left;
struct node *right;
};
void traverse( struct node *p ) {
if (p->left)
#pragma omp task // p is firstprivate by default
traverse(p->left);
if (p->right)
#pragma omp task // p is firstprivate by default
traverse(p->right);
process(p);
}
![Page 54: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/54.jpg)
54
#pragma omp taskwait
{ ... }
Task Wait
• Waits for completion of immediate child tasks
• Child tasks: Tasks generated since the beginning of the current task.
![Page 55: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/55.jpg)
55
OpenMP 4
• Task dependencies via new depend clause
• Depend (dependence-type:list)
– Where dependence-type=IN | INOUT | OUT
• Dependencies to previously generated sibling tasks.
• IN: The generated task will be a dependent task of all
previously generated sibling tasks that reference at least one
of the list items in an out or inout dependence-type list.
• OUT & INOUT: The generated task will be a dependent task
of all previously generated sibling tasks that reference at least
one of the list items in an in, out, or inout dependence-type
list.
![Page 56: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/56.jpg)
56
#pragma omp taskyield
{ ... }
Taskyield
• The taskyield construct specifies that the current task can be
suspended in favor of execution of a different task.
• Explicit task scheduling point
• Implicit task scheduling points• Task creation
• End of a task
• Taskwait
• Barrier synchronization
![Page 57: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/57.jpg)
57
Switch task while waiting
void foo ( omp_lock_t * lock, int n )
{
int i;
for ( i = 0; i < n; i++ )
#pragma omp task
{
something_useful();
while ( !omp_test_lock(lock) ) {
#pragma omp taskyield
}
something_critical();
omp_unset_lock(lock);
}
}
![Page 58: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/58.jpg)
58
![Page 59: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/59.jpg)
59
![Page 60: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/60.jpg)
60
![Page 61: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/61.jpg)
61
![Page 62: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/62.jpg)
62
![Page 63: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/63.jpg)
63
![Page 64: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/64.jpg)
64
![Page 65: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/65.jpg)
65
Terms
• tied task A task that, when its task region is
suspended, can be resumed only by the same thread
that suspended it. That is, the task is tied to that
thread.
• untied task (untied clause) A task that, when its task
region is suspended, can be resumed by any thread in
the team. That is, the task is not tied to any thread.
• undeferred task (if clause is false) A task for which
execution is not deferred with respect to its generating
task region. That is, its generating task region is
suspended until execution of the undeferred task is
completed.
![Page 66: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/66.jpg)
66
Terms
• included task A task for which execution is
sequentially included in the generating task region.
That is, it is undeferred and executed immediately by
the encountering thread. It has ist own data
environment.
• merged task (mergeable clause) A task whose data
environment is the same as that of its generating task
region.
• final task (final clause) A task that forces all of its
child tasks to become final and included tasks.
![Page 67: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/67.jpg)
67
Mergeable tasks
##include <stdio.h>
void foo ( )
{
int x = 2;
#pragma omp task mergeable
{
x++;
}
#pragma omp taskwait
printf("%d\n",x); // prints 2 or 3
![Page 68: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/68.jpg)
68
Mergeable tasks
#include <stdio.h>
void foo ( )
{
int x = 2;
#pragma omp task shared(x) mergeable
{
x++;
}
#pragma omp taskwait
printf("%d\n",x); // prints 3
![Page 69: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/69.jpg)
69
Synchronization in Tasks – Potential Deadlock
void work()
{
#pragma omp task
{ //Task 1
#pragma omp task
{ //Task 2
#pragma omp critical //Critical region 1
{/*do work here */ }
}
#pragma omp critical //Critical Region 2
{ //Capture data for the following task
#pragma omp task
{ /* do work here */ } //Task 3
}
}
![Page 70: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/70.jpg)
70
Collapsing of loops
• Handles multi-dimensional perfectly nested loops
• Larger iteration space ordered according to sequential
execution.
• Schedule clause applies to new iteration space
#pragma omp parallel for collapse(2)
for (i=0; i<n; i++)
for (j=0; j<n; j++)
for (k=0; k<n; k++)
{
.....
}
![Page 71: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/71.jpg)
71
Guaranteed Scheduling
• Same work distribution if
• Same number of iterations, schedule static with same
chunksize
• Both regions bind to same parallel region
!$omp do schedule(static)
do i=1,n
a(i) = ....
end do
!$ompend do nowait
!$omp do schedule(static)
do i=1,n
.... = a(i)
end do
![Page 72: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/72.jpg)
72
Scheduling strategy auto for parallel loops
• New scheduling strategy auto
• It is up to the compiler to determine the scheduling.
![Page 73: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/73.jpg)
73
Nested Parallelism
• Currently only a single copy of the control variable
specifying the number of threads in a team.
• omp_set_num_threads()
• Can be called only outside of parallel regions.
• This is applied for nested parallelism
• All teams have the same size.
• But num_threads clause of parallel region
• OpenMP 3.0 supports individual copies
• There is one copy per task.
• Teams might have different sizes.
![Page 74: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/74.jpg)
74
OpenMP 4
• SIMD support
• Directive for loops: guarantees that loop can be executed in a
SIMD fashion
• Directive for omp loops: interations are parallelized and those
assigned to a thread are executed with SIMD instructions
• Target construct for accelerators
• User-defined reductions
• Cancellation of a parallel region
• Affinity
• Places: Thread, core, socket
• Affinity policies: spread, close, master
![Page 75: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/75.jpg)
75
![Page 76: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •](https://reader030.fdocuments.us/reader030/viewer/2022041205/5d561cc188c993ca038b7410/html5/thumbnails/76.jpg)
76
Summary
• OpenMP is quasi-standard for shared memory
programming
• Based on Fork-Join Model
• Parallel region and work sharing constructs
• Declaration of private or shared variables
• Reduction variables
• Scheduling strategies
• Synchronization via Barrier, Critical section, Atomic,
locks, nestable locks
• Task concept
• SIMD and accelerator support.