Parallel Programming Models(Shared Address Space)
5th week
OpenMP Is …
An Application Program Interface (API) to be used to explicitly direct multi-threaded, shared memory parallelism Three API components Compiler Directives Runtime Library Routines Environment Variables
Portable APIs for C/C++ and Fortran Multiple platforms: most Unix platforms and Windows NT
OpenMP Is … (Cont’d)
Standardized Jointly proposed by a group of major computer hardwar
e and software vendors Expected to become an ANSI standard
What does OpenMP stand for? Open specifications for multi-processing
Collaborative work with interested parties from the hardware and software industry, government and academia
OpenMP Is Not …
Distributed memory parallel systems by itselfImplemented identically by all vendors Guaranteed to make the most efficient use of shared memory There are no data locality constructs
History
Directive-based, Fortran programming extensions In the early 90's, by vendors of shared-memory machin
es Augment a serial Fortran program with directives to sp
ecify loops to be parallelized The compiler is responsible for parallelizing such loops
across the SMP processors Implementations were all functionally similar, but were
diverging (as usual)
History (Cont’d)
ANSI X3H5 In 1994 Rejected due to waning interest as distributed memory
machines became popular.
OpenMP In the spring of 1997 Taking over where ANSI X3H5 had left off, as newer s
hared memory machine architectures become popular
Goals
Standardization Provide a standard among a variety of shared memory
architectures(platforms) High-level interfaces to thread programming
Lean and Mean A simple and limited set of directives for shared
address space programming Just 3 or 4 directives are enough to represent significant
parallelism
Hello World Program:Pthread Version
#include <pthread.h>#include <stdio.h>
void* thrfunc(void* arg){ printf(“hello from thread %d\n”, *(int*)arg);}
int main(void){ pthread_t thread[4]; pthread_attr_t attr; int arg[4] = {0,1,2,3}; int i;
// setup joinable threads with system scope pthread_attr_init(&attr); pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE); pthread_attr_setscope(&attr, PTHREAD_SCOPE_SYSTEM);
// create N threads for(i=0; i<4; i++) pthread_create(&thread[i], &attr, thrfunc, (void*)&arg[i]); // wait for the N threads to finish for(i=0; i<4; i++) pthread_join(thread[i], NULL);}
Hello World:OpenMP Version
#include <omp.h>#include <stdio.h>
int main(void){ #pragma omp parallel printf(“hello from thread %d\n”, omp_get_thread_num());}
Goals (Cont’d)
Ease of use Incrementally parallelize a serial program
Unlike all or nothing approach of message-passing Implement both coarse-grain and fine-grain parallelism
Portability Fortran (77, 90, and 95), C, and C++ Public forum for API and membership
Matrix Multiplication:Sequential Version
for (i=0; i<N; i++) { for (j=0; j<N; j++) { temp = 0; for (k=0; k<N; k++) temp += a[i][k] * b[k][j]; c[i][j] = temp; }}
Matrix Multiplication:MPI Version
BlkSz = N / # of processors;start = BlkSz * Rank;end = start + BlkSz;
MPI_Bcast (B, N * N, MPI_INT, 0, MPI_COMM_WORLD);if(Rank == 0) { for(i=1; i<# of processors; i++) MPI_Send(A + BlkSz * i, BlkSz, MPI_INT, i, TAG_INIT, MPI_COMM_WORLD);} else { MPI_Recv(A + start, BlkSz, MPI_INT, 0, TAG_INIT, MPI_COMM_WORLD, &status);}
Determine block size
Distributeblocks
for (i=start; i<end; i++) { for (j=0; j<N; j++) { temp = 0; for (k=0; k<N; k++) temp += a[i][k] * b[k][j]; c[i][j] = temp; }}
if (Rank == 0) { for (i=1; i<# of processors; i++) MPI_Recv (c + BLK_SZ * i, BLK_SZ, MPI_INT, i, TAG_END, MPI_COMM_WORLD, &status);} else { MPI_Send(c+start, BLK_SZ, MPI_INT, 0, TAG_END, MPI_COMM_WORLD);}
Calculate partial matrix multiplication
Gatherpartial result
Matrix Multiplication:OpenMP Version
#pragma omp parallel for private(temp), schedule(static)for (i=0; i<N; i++) { for (j=0; j<N; j++) { temp = 0; for (k=0; k<N; k++) temp += a[i][k] * b[k][j]; c[i][j] = temp; }}
Add directive
Programming Model
Thread Based Parallelism A shared memory process with multiple threads Based upon multiple threads in the shared memory pro
gramming paradigm
Explicit Parallelism Explicit (not automatic) programming model Offer the programmer full control over parallelization
Programming Model (Cont’d)
Fork - Join Model All OpenMP programs begin as a single sequential proc
ess: the master thread Fork at the beginning of parallel constructs
The master thread creates a team of parallel threads The statements enclosed by the parallel region construct are ex
ecuted in parallel Join at the end of parallel constructs
The threads synchronize and terminate after completing the statements in the parallel construct
Only the master thread exists
Fork-Join Model
Programming Model (Cont’d)
Compiler Directive Based Parallelism is specified through the use of compiler directives
imbedded in C/C++ or Fortran source code
Nested Parallelism Support Parallel constructs may include other parallel constructs inside. Implementation-dependent
Dynamic Threads Alter the number of threads used to execute parallel regions Implementation-dependent
General Code Structure#include <omp.h>main () { int var1, var2, var3;
Serial code ... /* Beginning of parallel section. Fork a team of threads. Specify variable scoping */ #pragma omp parallel private(var1, var2) shared(var3) { Parallel section executed by all threads ... All threads join master thread and disband } Resume serial code }
Terms
Construct A statement, which consists of a directive and the subsequent struc
tured block.Directive A C or C++ #pragma followed by the omp identifier, other text, an
d a new line. The directive specifies program behavior.
Structured block A structured block is a statement that has a single entry and a singl
e exit. A compound statement is a structured block if its execution always
begins at the opening { and always ends at the closing }.
Terms (Cont’d)
Lexical extent The code textually enclosed between the beginning and the end of
a structured block following a directive. The static extent of a directives does not span multiple routines or
code files
Orphaned Directive An OpenMP directive that appears independently from another en
closing directive It exists outside of another directive's static (lexical) extent. Will span routines and possibly code files
Terms (Cont’d)
Dynamic extent (region) All statements in the lexical extent, plus any statement inside a
function that is executed as a result of the execution of statements within the lexical extent.
The dynamic extent of a directive includes both its static (lexical) extent and the extents of its orphaned directives.
Master thread The thread that creates a team when a parallel region is entered.
Team One or more threads cooperating in the execution of a construct.
Lexical/Orphan/Dynamic Extent
#pragma omp parallel{
…#pragma omp forfor(i=0; i<n; i++) {
for(j=0; j<m; j++)sub1();
sub2();}
}
sub1(){
#pragma omp critical…
}
sub2(){
#pragma omp sections…
}
Static extent Orphan directives
Dynamic extent
Terms (Cont’d)Parallel region Statements that bind to an OpenMP parallel construct and may be execut
ed by multiple threads.Serial region Statements executed only by the master thread outside of the dynamic ex
tent of any parallel region.Private A private variable names a block of storage that is unique to the thread m
aking the reference.Shared A shared variable names a single block of storage. All threads in a team that access this variable will access this single bloc
k of storage.
OpenMP Components
Directives Work-sharing constructs Data environment clauses Synchronization constructs
Runtime librariesEnvironment variables
Directive Format
#pragma omp Directive name [clause, …] newline
Start of OpenMP C/C++ directives
Valid OpenMP directive,
After the pragma and before any clau
ses
In any orderCan be repeated
Required, Proceeds the
structured block enclosed by this
directive
Ex) #pragma omp parallel default(shared) private(beta, pi)
Directive Format (Cont’d)
General Rules Directives follow conventions of the C/C++ standards Case sensitive Only one directive-name per directive Each directive applies to at most one succeeding structu
red block A long directive can be extend to multi-lines escaping t
he newline character with a backslash ("\") at the end of a directive line.
Parallel DirectivePurpose A block of code to be executed by multiple threads. The fundamental OpenMP parallel construct
Format #pragma omp parallel [clause ...] newline if (scalar_expression) private (list) shared (list) default (shared | none) firstprivate (list) reduction (operator: list) copyin (list) structured_block
Parallel Directive (Cont’d)
Description In reaching a PARALLEL directive, a thread creates a
team of threads and becomes the master The master is a member of that team (id = 0) The code is duplicated and all threads will execute that
code. An implied barrier at the end of a parallel section Only the master thread continues execution past this
point.
Parallel Directive (Cont’d)
# of threads Determined by the following factors, in order of preced
ence: omp_set_num_threads() library function OMP_NUM_THREADS environment variable Implementation default
Threads are numbered from 0 (master thread) to N-1
Parallel Directive (Cont’d)
Clauses IF clause
If present, it must evaluate to .TRUE. (Fortran) or non-zero (C/C++) in order for a team of threads to be created.
Data scope attribute clauses
Restrictions A parallel region must be a structured block that does
not span multiple routines or code files Only a single IF clause is permitted
Parallel Directive (Cont’d)
Dynamic Threads By default, a program uses the same number of threads to execute
each parallel region. The run-time system can dynamically adjust the number of threads
omp_set_dynamic() library function OMP_DYNAMIC environment variable
Nested Parallel Regions A parallel region nested within another parallel region results in th
e creation of a new team, consisting of one thread, by default. Implementation-dependent
Example of Parallel Region#include <omp.h>
main () {
int nthreads, tid;
#pragma omp parallel private(nthreads, tid){ /* Fork a team of threads giving them their own copies of variables */
/* Obtain and print thread id */tid = omp_get_thread_num();printf("Hello World from thread = %d\n", tid);
if (tid == 0) { /* Only master thread does this */nthreads = omp_get_num_threads();printf("Number of threads = %d\n", nthreads);
}} /* All threads join master thread and terminate */
}
Work-Sharing Constructs
Description Divides the execution of the region among the
members of the team An implied barrier at the end of the constructs No implied barrier upon the entry of the constructs Work-sharing constructs do not launch new threads
Construct Types
#pragma omp for Shares iterations of a loop across the team. Represents a type of data parallelism
#pragma omp single Serializes a section of code
#pragma omp sections Breaks work into separate, discreet sections. Each section is executed by a thread. Can be used to implement a type of functional parallelism
Construct Types (Cont’d)
#pragma omp parallel for Simplified form of #pragma omp parallel + #pragma o
mp for
#pragma omp parallel sections Simplified form of #pragma omp parallel + #pragma o
mp sections
Work-Sharing Constructs
Restrictions Must be enclosed dynamically within a parallel region
for parallel execution Must be encountered by all members of a team or none
at all Successive work-sharing constructs must be
encountered in the same order by all members of a team
#pragma omp for
Purpose The iterations of the loop immediately following this
directive must be executed in parallel by the team This assumes a parallel region has already been
initiated Otherwise it executes in serial on a single processor
#pragma omp for (Cont’d)
#pragma omp for (Cont’d)
Format #pragma omp for [clause ...] newline schedule (type [,chunk]) ordered private (list) firstprivate (list) lastprivate (list) shared (list) reduction (operator: list) nowait for_loop
Clauses
SCHEDULE clause How iterations of the loop are divided among the
threads in the team The default schedule is implementation dependent STATIC
Loop iterations are divided into pieces of size chunk and then statically assigned to threads
By default, the iterations are evenly (if possible) divided contiguously among the threads
SCHEDULE Clause DYNAMIC
Loop iterations are divided into pieces of size chunk, and dynamically scheduled among the threads
When a thread finishes one chunk, it is dynamically assigned another.
The default chunk size is 1 GUIDED
The chunk size is exponentially reduced with each dispatched piece of the iteration space.
The chunk size specifies the minimum number of iterations to dispatch each time..
The default chunk size is 1.
SCHEDULE Clause (Cont’d) RUNTIME:
The scheduling decision is deferred until runtime by the environment variable OMP_SCHEDULE. It is illegal to specify a chunk size for this clause.
ORDERED clause When ORDERED directives are enclosed within the for directive
NOWAIT clause Threads do not synchronize at the end of the parallel loop Threads proceed directly to the next statements after the loop
SCHEDULE Clause (Cont’d)
Restrictions The for loop can not be a do while loop, or a loop without loop co
ntrol. The loop iteration variable must be an integer and the loop control
parameters must be the same for all threads. Program correctness must not depend upon which thread executes
a particular iteration. The chunk size must be specified as a loop invariant integer expres
sion The C/C++ for directive requires that the for-loop must have cano
nical shape. ORDERED and SCHEDULE clauses may appear once each.
Example of For Directive#include <omp.h>#define CHUNKSIZE 100#define N 1000
main () {
int i, chunk;float a[N], b[N], c[N];
/* Some initializations */for (i=0; i < N; i++)
a[i] = b[i] = i * 1.0;chunk = CHUNKSIZE;
#pragma omp parallel shared(a,b,c,chunk) private(i){
#pragma omp for schedule(dynamic,chunk) nowaitfor (i=0; i < N; i++)
c[i] = a[i] + b[i];} /* end of parallel section */
}
#pragma omp sections
Purpose A non-iterative work-sharing construct The enclosed section(s) of code are to be divided
among the threads in the team Independent SECTION directives are nested within a
SECTIONS directive Each SECTION is executed once by a thread in the
team. Different sections will be executed by different threads.
#pragma omp sections (Cont’d)
#pragma omp sections (Cont’d)
Format#pragma omp sections [clause ...] newline private (list) firstprivate (list) lastprivate (list) reduction (operator: list) nowait{
#pragma omp section newline structured_block#pragma omp section newline structured_block
}
#pragma omp sections (Cont’d)
Clauses An implied barrier at the end of a SECTIONS directive, unless the
nowait clause is used
Questions What happens if the number of threads and the number of SECTI
ONs are different? More threads than SECTIONs? Less threads than SECTIONs?
Which thread executes which SECTION?
Restriction SECTION directives must occur within the lexical extent of an enc
losing SECTIONS directive
Example of Sections Directive
include <omp.h>#define N 1000
main (){
int i;float a[N], b[N], c[N];
/* Some initializations */for (i=0; i < N; i++)
a[i] = b[i] = i * 1.0;
Example of Sections Directive (Cont’d)
#pragma omp parallel shared(a,b,c) private(i){
#pragma omp sections nowait{
#pragma omp sectionfor (i=0; i < N/2; i++)
c[i] = a[i] + b[i];
#pragma omp sectionfor (i=N/2; i < N; i++)
c[i] = a[i] + b[i];} /* end of sections */
} /* end of parallel section */
}
#pragma omp single
Purpose The enclosed code is to be executed by only one thread
in the team May be useful when dealing with sections of code that
are not thread safe (such as I/O)
#pragma omp single (Cont’d)
#pragma omp single (Cont’d)
Format#pragma omp single [clause ...] newline private (list) firstprivate (list) nowait structured_block
Clauses Threads in the team that do not execute the SINGLE dir
ective, wait at the end of the enclosed code block, unless a nowait clause is specified
#pragma omp parallel for#include <omp.h>#define N 1000#define CHUNKSIZE 100
main () {int i, chunk;float a[N], b[N], c[N];
/* Some initializations */for (i=0; i < N; i++)
a[i] = b[i] = i * 1.0;chunk = CHUNKSIZE;
#pragma omp parallel for shared(a,b,c,chunk) private(i) schedule(static,chunk)for (i=0; i < n; i++)
c[i] = a[i] + b[i];}
Data Environment
#pragma omp threadprivateData scope clauses
#pragma omp threadprivate
Purpose Make global file scope variables local and persistent to
a thread through the execution of multiple parallel regions
Format #pragma omp threadprivate (list)
#pragma omp threadprivate (Cont’d)
Notes Appear after the declaration of listed variables/common
blocks. Written by one thread is not visible to other threads On first entry to a parallel region, data in
THREADPRIVATE variables should be assumed undefined, unless a COPYIN clause is specified in the PARALLEL directive
Differ from PRIVATE variables because they are persistent
#pragma omp threadprivate (Cont’d)
Restrictions Data in THREADPRIVATE objects is guaranteed to
persist only if the dynamic threads mechanism is "turned off" and the number of threads in different parallel regions remains constant.
The default setting of dynamic threads is undefined. Must appear after every declaration of a thread private
variable block.
Example of Threadprivate Directive
int alpha[10], beta[10], i;#pragma omp threadprivate(alpha)
main () {
/* First parallel region */#pragma omp parallel private(i,beta) for (i=0; i < 10; i++)
alpha[i] = beta[i] = i;
/* Second parallel region */#pragma omp parallel printf("alpha[3]= %d and beta[3]= %d\n",alpha[3],beta[3]);
}
Data Scope Clauses
Data scope attribute clauses Explicitly define how variables should be scoped An important consideration for OpenMP programming is the unde
rstanding and use of data scoping Most variables are shared by default
Global variables include File scope variables, static
Private variables include Loop index variables Stack variables in subroutines called from parallel regions
Kinds of Data Scope Clauses
#pragma … private#pragma … firstprivate#pragma … lastprivate#pragma … shared#pragma … default#pragma … reduction#pragma … copyin
Data Scope Clauses (Cont’d)
Used in conjunction with several directives to control the scoping of enclosed variablesControl the data environment during execution of parallel constructs. How and which data variables in the serial section of the program
are transferred to the parallel sections of the program (and back) Which variables will be visible to all threads in the parallel section
s and which variables will be privately allocated to all threads.
Effective only within their lexical/static extent
PRIVATE Clause
Purpose Declares variables in its list to be private to each thread
Format private (list)
Behavior A new object of the same type is declared once for each
thread in the team All references to the original object are replaced with re
ferences to the new object Uninitialized for each thread
Comparison Between PRIVATE And
THREADPRIVATEPRIVATE THREADPRIVATE
Data Item
Where declared
Persistent
Extent
Initialized
C/C++: variable C/C++: variable
At start of region or work-sharing group
In declarations of each routine using block or global file scope
No Yes
Lexical only - unless passed as an argument to
subroutine Dynamic
FIRSTPRIVATE COPYIN
Shared Clause
Purpose Declares variables in its list to be shared among all
threads in the teamFormat shared (list)Notes Exists in only one memory location and all threads can
read or write to that address It is the programmer's responsibility to ensure that
multiple threads properly access SHARED variables
Default Clause
Purpose Allows the user to specify a default PRIVATE,
SHARED, or NONE scope for all variables in the lexical extent of any parallel region.
Format default (shared | none)Notes Specific variables can be exempted from the default
using the PRIVATE, SHARED, FIRSTPRIVATE, LASTPRIVATE, and REDUCTION clauses
Default Clause (Cont’d)
Restrictions Only one DEFAULT clause can be specified on a
PARALLEL directive
Firstprivate Clause
Purpose Combines the behavior of the PRIVATE clause with au
tomatic initialization of the variables in its list.
Format firstprivate (list)
Notes Listed variables are initialized according to the value of
their original objects prior to entry into the parallel or work-sharing construct.
Lastprivate Clause
Purpose Combines the behavior of the PRIVATE clause with a
copy from the last loop iteration or section to the original variable object
Format lastprivate (list)
Note The value copied back into the original variable object i
s obtained from the last (sequentially) iteration or section of the enclosing construct
Copyin Clause
Purpose Provides a means for assigning the same value to THREADPRIV
ATE variables for all threads in the team
Format copyin (list)
Notes List contains the names of variables to copy. The master thread variable is used as the copy source. The team threads are initialized with its value upon entry into the p
arallel construct
Reduction Clause
Purpose Performs a reduction on the variables that appear in its
list. A private copy for each list variable is created for each
thread. At the end of the reduction, the reduction variable is
applied to all private copies of the shared variable, and the final result is written to the global shared variable
Format reduction (operator: list)
Reduction Clause (Cont’d)
Restrictions Variables in the list must be named scalar variables They must also be declared SHARED in the enclosing
context. Reduction operations may not be associative for real
numbers.
Reduction Clause (Cont’d) The reduction variable is used only in statements which
have one of following forms x = x op expr x = expr op x (except subtraction) x binop = expr x++ ++x x-- --x
Reduction Example
#include <omp.h>
main () {int i, n, chunk;float a[100], b[100], result;
/* Some initializations */n = 100;chunk = 10;result = 0.0;for (i=0; i < n; i++) {
a[i] = i * 1.0;b[i] = i * 2.0;
}
#pragma omp parallel for default(shared) private(i) schedule(static,chunk) \reduction(+:result)
for (i=0; i < n; i++) result = result + (a[i] * b[i]);
printf("Final result= %f\n",result);}
Synchronization Constructs
#pragma omp master#pragma omp critical#pragma omp barrier#pragma omp atomic#pragma omp flush#pragma omp ordered
Race Condition
increment(x){
x = x + 1;}
increment(x){
x = x + 1;}
Thread A
One possible execution sequence: 1. Thread 1 loads the value of x into register A. 2. Thread 2 loads the value of x into register A. 3. Thread 1 adds 1 to register A 4. Thread 2 adds 1 to register A 5. Thread 1 stores register A at location x 6. Thread 2 stores register A at location x
Thread B
Race Condition (Cont’d)
Solutions The increment of x must be synchronized between the t
wo threads OpenMP provides a variety of synchronization construc
ts to control how the execution of each thread proceeds relative to other team threads.
#pragma omp master
Purpose Specifies a region to be executed only by the master thr
ead of the team. All other threads on the team skip this section of code No implied barrier associated with this directive
Format #pragma omp master newline structured_block
#pragma omp critical
Purpose Specifies a region of code that must be executed by onl
y one thread at a time.
Format #pragma omp critical [ name ] newline structured_block
#pragma omp critical (Cont’d)
Notes Race condition
Other thread will block until the first thread exits the CRITICAL region
The optional name enables multiple different CRITICAL regions to exist
Different CRITICAL regions with the same name are treated as the same region
All unnamed CRITICAL sections are treated as the same section
Example of Critical Directive
#include <omp.h>
main(){
int x;x = 0;
#pragma omp parallel shared(x) {
#pragma omp critical x = x + 1;
} /* end of parallel section */}
#pragma omp atomic
Purpose Specifies that a specific memory location must be updat
ed atomically A mini-CRITICAL section
Format pragma omp atomic newline statement_expression
#pragma omp atomic (Cont’d)
Restriction An atomic statement must have one of the following for
ms x binop = expr x++ ++x x-- --x
#pragma omp ordered
Purpose Specifies that iterations of the enclosed loop will be exe
cuted in the same order as if they were executed on a serial processor
Format #pragma omp ordered newline structured_block
#pragma omp ordered (Cont’d)
Restrictions Only appear in the dynamic extent of the following
directives for or parallel for
Only one thread is allowed in an ordered section at any time
An iteration of a loop must not execute the same ORDERED directive more than once, and it must not execute more than one ORDERED directive.
A loop which contains an ORDERED directive, must be a loop with an ORDERED clause.
Directive Binding Rules
The for, SECTIONS, SINGLE, MASTER and BARRIER directives bind to the dynamically enclosing PARALLEL, if one exists.If no parallel region is currently being executed, the directives have no effect.The ORDERED directive binds to the dynamically enclosing for.The ATOMIC directive enforces exclusive access with respect to ATOMIC directives in all threads, not just the current team.
Directive Binding Rules (Cont’d)
The CRITICAL directive enforces exclusive access with respect to CRITICAL directives in all threads, not just the current team.A directive can never bind to any directive outside the closest enclosing PARALLEL.
Directive Nesting Rules
A PARALLEL directive dynamically inside another PARALLEL directive logically establishes a new team, which is composed of only the current thread unless nested parallelism is enabled. For, SECTIONS, and SINGLE directives that bind to the same PARALLEL are not allowed to be nested inside of each other.For, SECTIONS, and SINGLE directives are not permitted in the dynamic extent of CRITICAL, ORDERED and MASTER regions.
Directive Nesting Rules (Cont’d)
CRITICAL directives with the same name are not permitted to be nested inside of each other.BARRIER directives are not permitted in the dynamic extent of DO/for, ORDERED, SECTIONS, SINGLE, MASTER and CRITICAL regions. MASTER directives are not permitted in the dynamic extent of DO/for, SECTIONS and SINGLE directives.
Directive Nesting Rules (Cont’d)
ORDERED directives are not permitted in the dynamic extent of CRITICAL regions. Any directive that is permitted when executed dynamically inside a PARALLEL region is also legal when executed outside a parallel region. When executed dynamically outside a user-specified parallel region, the directive is executed with respect to a team composed of only the master thread.
Environment Variables
All environment variable names are uppercase. The values assigned to them are not case sensitive.
OMP_SCHEDULE Applies only to for, parallel for directives which have t
heir schedule clause set to RUNTIME setenv OMP_SCHEDULE "guided, 4" setenv OMP_SCHEDULE "dynamic"
Environment Variables (Cont’d)
OMP_NUM_THREADS Sets the maximum number of threads to use during execution.
setenv OMP_NUM_THREADS 8
OMP_DYNAMIC Enables or disables dynamic adjustment of the number of threads a
vailable for execution of parallel regions. setenv OMP_DYNAMIC TRUE
OMP_NESTED Enables or disables nested parallelism.
setenv OMP_NESTED TRUE
Top Related