CS 838: Pervasive Parallelism Introduction to OpenMP Copyright 2005 Mark D. Hill University of...

31
CS 838: Pervasive Parallelism Introduction to OpenMP Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from online references of Lawrence Livermore National Laboratory, National Energy Research Scientific Computing Center, University of Minnesota, OpenMP.org Thanks!

Transcript of CS 838: Pervasive Parallelism Introduction to OpenMP Copyright 2005 Mark D. Hill University of...

CS 838: Pervasive Parallelism

Introduction to OpenMP

Copyright 2005 Mark D. HillUniversity of Wisconsin-Madison

Slides are derived from online references ofLawrence Livermore National Laboratory, National

Energy Research Scientific Computing Center, University of Minnesota, OpenMP.org

Thanks!

CS 838 2(C) 2005

Outline

• Introduction– Motivation

– Example

• Programming Model– Expressing parallelism

– Synchronization

• Syntax

CS 838 3(C) 2005

Introduction to OpenMP

• What is OpenMP?– Open specification for Multi-Processing

– “Standard” API for defining multi-threaded shared-memory programs

• Header– Preprocessor (compiler) directives

– Library Calls

– Environment Variables

CS 838 4(C) 2005

Motivation

• Thread libraries are hard to use– P-Threads/Solaris threads have many library calls for initialization,

synchronization, thread creation, condition variables, etc.

– Usually require alternate programming styles

» Programmer must code with multiple threads in mind

• Synchronization between threads introduces a new dimension of program correctness

CS 838 5(C) 2005

Motivation

• Wouldn’t it be nice to write serial programs and somehow parallelize them “automatically”?

– OpenMP can parallelize many serial programs with relatively few annotations that specify parallelism and independence

– OpenMP is a small API that hides cumbersome threading calls with simpler directives

CS 838 6(C) 2005

Example: Hello, World!

Files:

(Makefile)

hello.c

CS 838 7(C) 2005

Outline

• Introduction– Motivation

– Example

• Programming Model– Expressing parallelism

– Synchronization

• Syntax

CS 838 8(C) 2005

Programming Model

• Thread-like Fork/Join model

– Typically multiple thread creation/ destruction events

– Some built-in automatic parallelization

Fork

Join

CS 838 9(C) 2005

Programming Models

• Data parallelism– Threads perform similar

functions, guided by thread identifier

• Control parallelism– Threads perform differing

functions» One thread for I/O, one for

computation, etc…

Fork

Join

CS 838 10(C) 2005

Programming Model

Master Thread

• Thread with ID=0

• Only thread that exists in sequential regions

• Depending on implementation, may have special purpose inside parallel regions

• Some special directives affect only the master thread (like master)

Fork

Join

0

0 1 2 3 4 5 6 7

0

CS 838 11(C) 2005

Programming Model

Parallel Code Sections

• Loop annotation directives can be used to parallelize loops

• Explicit parallel regions are declared with the parallel directive– Start of region corresponds to N pthread_create() calls (Fork

Event)

– End of region corresponds to N pthread_join() calls (Join Event)

CS 838 12(C) 2005

Programming Model

Synchronization Constructs

• Synchronization provided by OpenMP specification– Can declare Critical Sections in code

» Mutual exclusion guaranteed at runtime

– Can declare “simple” statements as atomic

– Barrier directives

– Lock functions

CS 838 13(C) 2005

Programming Model

Directives

• Directives (preprocessor) used to express parallelism and independence to OpenMP implementation

• Some synchronization directives– Atomic, Critical Section, Barrier

CS 838 14(C) 2005

Programming Models

Library Calls

• Library calls provide functionality that cannot be built-in at compile-time

• Mutator/Accessor functions– omp_[get,set]_num_threads()

• Lock/Unlock functionality

CS 838 15(C) 2005

Programming Model

Environment Variables

• Provide default behavior

• Allow other processes to change behavior of OpenMP-enabled programs

• Useful for scripting:– setenv OMP_NUM_THREADS 4

CS 838 16(C) 2005

Limitations

• OpenMP isn’t guaranteed to divide work optimally among threads

• Highly sensitive to programming style

• Overheads higher than traditional threading

• Requires compiler support (use cc)

• Doesn’t parallelize dependencies

CS 838 17(C) 2005

Outline

• Introduction– Motivation

– Example

• Programming Model– Expressing parallelism

– Synchronization

• Syntax

CS 838 18(C) 2005

OpenMP Syntax

• General syntax for OpenMP directives

• Directive specifies type of OpenMP operation– Parallelization

– Synchronization

– Etc.

• Clauses (optional) modify semantics of Directive

#pragma omp directive [clause…] CR

CS 838 19(C) 2005

OpenMP Syntax

• PARALLEL syntax

Ex: Output: (N=4)

#pragma omp parallel

{

printf(“Hello!\n”);

} // implicit barrier

Hello!

Hello!

Hello!

Hello!

#pragma omp parallel [clause…] CR

structured_block

CS 838 20(C) 2005

OpenMP Syntax

• DO/for Syntax (DO-Fortran, for-C)

Ex:

#pragma omp parallel

{

#pragma omp for private(i) shared(x) \

schedule(static,x/N)

for(i=0;i<x;i++) printf(“Hello!\n”);

} // implicit barrier

Note: Must reside inside a parallel section

#pragma omp for [clause…] CR

for_loop

CS 838 21(C) 2005

OpenMP Syntax

More on Clauses• private() – A variable in private list is private to

each thread• shared() – Variables in shared list are visible to all

threads– Implies no synchronization, or even consistency!

• schedule() – Determines how iterations will be divided among threads– schedule(static, C) – Each thread will be given C iterations

» Usually N*C = Number of total iterations– schedule(dynamic) – Each thread will be given additional

iterations as-needed» Often less efficient than considered static allocation

• nowait – Removes implicit barrier from end of block

CS 838 22(C) 2005

OpenMP Syntax

• PARALLEL FOR (combines parallel and for)

Ex:

#pragma omp parallel for shared(x)\

private(i) \

schedule(dynamic)

for(i=0;i<x;i++) {

printf(“Hello!\n”);

}

#pragma omp parallel for [clause…] CR

for_loop

CS 838 23(C) 2005

Example: AddMatrix

Files:

(Makefile)

addmatrix.c // omp-parallelized

matrixmain.c // non-omp

printmatrix.c // non-omp

CS 838 24(C) 2005

OpenMP Syntax

• ATOMIC syntax

Ex:

#pragma omp parallel shared(x)

{

#pragma omp atomic

x++;

} // implicit barrier

#pragma omp atomic CR

simple_statement

CS 838 25(C) 2005

OpenMP Syntax

• CRITICAL syntax

Ex:

#pragma omp parallel shared(x)

{

#pragma omp critical

{

// only one thread in here

}

} // implicit barrier

#pragma omp critical CR

structured_block

CS 838 26(C) 2005

OpenMP Syntax

ATOMIC vs. CRITICAL

• Use ATOMIC for “simple statements”– Usu. Lower overhead than CRITICAL

• Use CRITICAL for larger expressions– May involve an unseen implicit lock

CS 838 27(C) 2005

OpenMP Syntax

• MASTER – only Thread 0 executes a block

• SINGLE – only one thread executes a block

• No implied synchronization

#pragma omp master CR

structured_block

#pragma omp single CR

structured_block

CS 838 28(C) 2005

OpenMP Syntax

• BARRIER

• Locks– Locks are provided through omp.h library calls– omp_init_lock()– omp_destroy_lock()– omp_test_lock()– omp_set_lock()– omp_unset_lock()

#pragma omp barrier CR

CS 838 29(C) 2005

OpenMP Syntax

• FLUSH

• Guarantees that threads’ views of memory is consistent

• Why? Remember OpenMP directives…– Code generated by directives at compile-time cannot respond to

dynamic events

» Variables are not always declared as volatile

» Using variables from registers instead of memory can seem like a consistency violation

– Synch. Often has an implicit flush

» ATOMIC, CRITICAL

#pragma omp flush CR

CS 838 30(C) 2005

Example: Synchronization

Files:

(Makefile)

increment.c

CS 838 31(C) 2005

OpenMP Syntax

• Functions

omp_set_num_threads()

omp_get_num_threads()

omp_get_max_threads()

omp_get_num_procs()

omp_get_thread_num()

omp_set_dynamic()

omp_[init destroy test set unset]_lock()