CS 838: Pervasive Parallelism Introduction to OpenMP Copyright 2005 Mark D. Hill University of...
-
Upload
hester-white -
Category
Documents
-
view
214 -
download
0
Transcript of CS 838: Pervasive Parallelism Introduction to OpenMP Copyright 2005 Mark D. Hill University of...
CS 838: Pervasive Parallelism
Introduction to OpenMP
Copyright 2005 Mark D. HillUniversity of Wisconsin-Madison
Slides are derived from online references ofLawrence Livermore National Laboratory, National
Energy Research Scientific Computing Center, University of Minnesota, OpenMP.org
Thanks!
CS 838 2(C) 2005
Outline
• Introduction– Motivation
– Example
• Programming Model– Expressing parallelism
– Synchronization
• Syntax
CS 838 3(C) 2005
Introduction to OpenMP
• What is OpenMP?– Open specification for Multi-Processing
– “Standard” API for defining multi-threaded shared-memory programs
• Header– Preprocessor (compiler) directives
– Library Calls
– Environment Variables
CS 838 4(C) 2005
Motivation
• Thread libraries are hard to use– P-Threads/Solaris threads have many library calls for initialization,
synchronization, thread creation, condition variables, etc.
– Usually require alternate programming styles
» Programmer must code with multiple threads in mind
• Synchronization between threads introduces a new dimension of program correctness
CS 838 5(C) 2005
Motivation
• Wouldn’t it be nice to write serial programs and somehow parallelize them “automatically”?
– OpenMP can parallelize many serial programs with relatively few annotations that specify parallelism and independence
– OpenMP is a small API that hides cumbersome threading calls with simpler directives
CS 838 7(C) 2005
Outline
• Introduction– Motivation
– Example
• Programming Model– Expressing parallelism
– Synchronization
• Syntax
CS 838 8(C) 2005
Programming Model
• Thread-like Fork/Join model
– Typically multiple thread creation/ destruction events
– Some built-in automatic parallelization
Fork
Join
CS 838 9(C) 2005
Programming Models
• Data parallelism– Threads perform similar
functions, guided by thread identifier
• Control parallelism– Threads perform differing
functions» One thread for I/O, one for
computation, etc…
Fork
Join
CS 838 10(C) 2005
Programming Model
Master Thread
• Thread with ID=0
• Only thread that exists in sequential regions
• Depending on implementation, may have special purpose inside parallel regions
• Some special directives affect only the master thread (like master)
Fork
Join
0
0 1 2 3 4 5 6 7
0
CS 838 11(C) 2005
Programming Model
Parallel Code Sections
• Loop annotation directives can be used to parallelize loops
• Explicit parallel regions are declared with the parallel directive– Start of region corresponds to N pthread_create() calls (Fork
Event)
– End of region corresponds to N pthread_join() calls (Join Event)
CS 838 12(C) 2005
Programming Model
Synchronization Constructs
• Synchronization provided by OpenMP specification– Can declare Critical Sections in code
» Mutual exclusion guaranteed at runtime
– Can declare “simple” statements as atomic
– Barrier directives
– Lock functions
CS 838 13(C) 2005
Programming Model
Directives
• Directives (preprocessor) used to express parallelism and independence to OpenMP implementation
• Some synchronization directives– Atomic, Critical Section, Barrier
CS 838 14(C) 2005
Programming Models
Library Calls
• Library calls provide functionality that cannot be built-in at compile-time
• Mutator/Accessor functions– omp_[get,set]_num_threads()
• Lock/Unlock functionality
CS 838 15(C) 2005
Programming Model
Environment Variables
• Provide default behavior
• Allow other processes to change behavior of OpenMP-enabled programs
• Useful for scripting:– setenv OMP_NUM_THREADS 4
CS 838 16(C) 2005
Limitations
• OpenMP isn’t guaranteed to divide work optimally among threads
• Highly sensitive to programming style
• Overheads higher than traditional threading
• Requires compiler support (use cc)
• Doesn’t parallelize dependencies
CS 838 17(C) 2005
Outline
• Introduction– Motivation
– Example
• Programming Model– Expressing parallelism
– Synchronization
• Syntax
CS 838 18(C) 2005
OpenMP Syntax
• General syntax for OpenMP directives
• Directive specifies type of OpenMP operation– Parallelization
– Synchronization
– Etc.
• Clauses (optional) modify semantics of Directive
#pragma omp directive [clause…] CR
CS 838 19(C) 2005
OpenMP Syntax
• PARALLEL syntax
Ex: Output: (N=4)
#pragma omp parallel
{
printf(“Hello!\n”);
} // implicit barrier
Hello!
Hello!
Hello!
Hello!
#pragma omp parallel [clause…] CR
structured_block
CS 838 20(C) 2005
OpenMP Syntax
• DO/for Syntax (DO-Fortran, for-C)
Ex:
#pragma omp parallel
{
#pragma omp for private(i) shared(x) \
schedule(static,x/N)
for(i=0;i<x;i++) printf(“Hello!\n”);
} // implicit barrier
Note: Must reside inside a parallel section
#pragma omp for [clause…] CR
for_loop
CS 838 21(C) 2005
OpenMP Syntax
More on Clauses• private() – A variable in private list is private to
each thread• shared() – Variables in shared list are visible to all
threads– Implies no synchronization, or even consistency!
• schedule() – Determines how iterations will be divided among threads– schedule(static, C) – Each thread will be given C iterations
» Usually N*C = Number of total iterations– schedule(dynamic) – Each thread will be given additional
iterations as-needed» Often less efficient than considered static allocation
• nowait – Removes implicit barrier from end of block
CS 838 22(C) 2005
OpenMP Syntax
• PARALLEL FOR (combines parallel and for)
Ex:
#pragma omp parallel for shared(x)\
private(i) \
schedule(dynamic)
for(i=0;i<x;i++) {
printf(“Hello!\n”);
}
#pragma omp parallel for [clause…] CR
for_loop
CS 838 23(C) 2005
Example: AddMatrix
Files:
(Makefile)
addmatrix.c // omp-parallelized
matrixmain.c // non-omp
printmatrix.c // non-omp
CS 838 24(C) 2005
OpenMP Syntax
• ATOMIC syntax
Ex:
#pragma omp parallel shared(x)
{
#pragma omp atomic
x++;
} // implicit barrier
#pragma omp atomic CR
simple_statement
CS 838 25(C) 2005
OpenMP Syntax
• CRITICAL syntax
Ex:
#pragma omp parallel shared(x)
{
#pragma omp critical
{
// only one thread in here
}
} // implicit barrier
#pragma omp critical CR
structured_block
CS 838 26(C) 2005
OpenMP Syntax
ATOMIC vs. CRITICAL
• Use ATOMIC for “simple statements”– Usu. Lower overhead than CRITICAL
• Use CRITICAL for larger expressions– May involve an unseen implicit lock
CS 838 27(C) 2005
OpenMP Syntax
• MASTER – only Thread 0 executes a block
• SINGLE – only one thread executes a block
• No implied synchronization
#pragma omp master CR
structured_block
#pragma omp single CR
structured_block
CS 838 28(C) 2005
OpenMP Syntax
• BARRIER
• Locks– Locks are provided through omp.h library calls– omp_init_lock()– omp_destroy_lock()– omp_test_lock()– omp_set_lock()– omp_unset_lock()
#pragma omp barrier CR
CS 838 29(C) 2005
OpenMP Syntax
• FLUSH
• Guarantees that threads’ views of memory is consistent
• Why? Remember OpenMP directives…– Code generated by directives at compile-time cannot respond to
dynamic events
» Variables are not always declared as volatile
» Using variables from registers instead of memory can seem like a consistency violation
– Synch. Often has an implicit flush
» ATOMIC, CRITICAL
#pragma omp flush CR