Chapter 17

43
Chapter 17 Shared-Memory Programming

description

Chapter 17. Shared-Memory Programming. Introduction. OpenMP is an application programming interface (API) for parallel programming on multiprocessors. It consists of a set of compiler directives and a library of support functions. fork/join parallelism. Incremental parallelization. - PowerPoint PPT Presentation

Transcript of Chapter 17

Page 1: Chapter 17

Chapter 17

Shared-Memory Programming

Page 2: Chapter 17

Introduction

OpenMP is an application programming interface (API) for parallel programming on multiprocessors. It consists of a set of compiler directives and a library of support functions.

Page 3: Chapter 17
Page 4: Chapter 17

fork/join parallelism

Page 5: Chapter 17

Incremental parallelization

– the process of transforming a sequential program into a parallel program one block of code at a time.

Page 6: Chapter 17
Page 7: Chapter 17

OpenMP compiler directives

parallelforparallel forsectionsparallel sectionscriticalsingle

Page 8: Chapter 17

OpenMP functions

int omp_get_num_procs (void)int omp_get_num_threads (void)int omp_get_thread_num (void)void omp_set_num_threads (int t)

Page 9: Chapter 17

Parallel for Loops

for (i=first; I < size; I +=prime) marked [i] = 1;

Page 10: Chapter 17

parallel for Pragma

Pragma: a compiler directive in C or C++ is called a pragma. It is short for “pragmatic information”.

Syntax: #pragma omp <rest of pragma>

e.g. #pragma omp parallel for

for (i=first; I < size; I +=prime) marked [i] = 1;

Page 11: Chapter 17
Page 12: Chapter 17

Execution contextEvery thread has its own execution context: an address space containing all of variables the thread may access. The execution context includes static variables, dynamically allocated data structures in the heap, and variables on the run-time stack.

Shared variable

Private variable

Page 13: Chapter 17
Page 14: Chapter 17

Declaring Private Variables

private Clause (A clause is an optional component to a pragma)e.g.private (<variable list>)

# pragma omp parallel for private (j)for (i=0; i <= BLOCK_SIZE(id,p,n); i++) for (j=0; j < n; j++) a[i][j] = MIN (a[i][j], a[i][k] + tmp[j]);

Page 15: Chapter 17

firstprivate Clause

x[0] = complex_function();#pragma omp parallel for private (j) firstprivate (x)for (i =0; i < n; i++) for (j = 1; j < 4; j++) x[j] = g(i, x[j-1]); answer [i] = x[1] – x[3];

Page 16: Chapter 17

lastprivate Clause

#pragma omp parallel for private (j) lastprivate (x)For (I = 0; I < n; i++) { x[0] = 1.0; for (j = 1; j < 4; j++) x[j] = x[j-1] * (I + 1); sum_of_powers[i] = x[0] + x[1] + x[2] + x[3];}N_cubed = x [3];

Page 17: Chapter 17

Critical Sections

#pragma omp parallel for private (x) for (i = 0; i < n; i++) { x = (i+0.5)/n; area += 4.0/(1.0 + x*x); /* Race Condition! */}pi = area/n;

Page 18: Chapter 17
Page 19: Chapter 17

#pragma omp parallel for private (x) for (i = 0; i < n; i++) { x = (i+0.5)/n;#pragma omp critical area += 4.0/(1.0 + x*x); }pi = area/n;

Page 20: Chapter 17

ReductionsSyntax: reduction (<op> : <variable>)

#pragma omp parallel for private (x) reduction (+:area) for (i = 0; i < n; i++) { x = (i+0.5)/n; area += 4.0/(1.0 + x*x); }pi = area/n;

Page 21: Chapter 17
Page 22: Chapter 17

Performance ImprovementInverting Loopse.g. for (i = 1; i < m; i++) for (j = 0; j < n; j++) a[i][j] = 2 * a [i-1][j];

Page 23: Chapter 17
Page 24: Chapter 17

#pragma parallel for private (i) for (j = 0; j < n; j++) for (i = 1; i < m; i++) a[i][j] = 2 * a [i-1][j];

Page 25: Chapter 17

Conditionally Executing Loops

#pragma omp parallel for private (x) reduction (+:area) if (n > 5000)

for (i = 0; i < n; i++) { x = (i+0.5)/n; area += 4.0/(1.0 + x*x); }pi = area/n;

Page 26: Chapter 17
Page 27: Chapter 17

Scheduling Loops

Syntax: Schedule (<type> [, <chunk>])

Schedule (static): A static allocation of about n/t contiguous iterations to each thread.

Schedule (static, C): An interleaved allocation of chunks to tasks. Each chunk contains C contiguous iterations.

Page 28: Chapter 17

Schedule (dynamic) : Iterations are dynamically allocated, one at a time, to threads.

Schedule (dynamic, C) : A dynamic allocation of C iterations at a time to the tasks.

Schedule (guided, C) : A dynamic allocation of iterations to tasks using the guided self-scheduling heuristic. Guided self-scheduling begins by allocating a large chunk size to each task and responds to further requests for chunks by allocating chunks of decreasing size. The size of the chunks decreases exponentially to a minimum chunk size of C.

Page 29: Chapter 17

Schedule(guided): Guided self-scheduling with a minimum chunk size of 1.

Schedule(runtime): The schedule type is chosen at run-time based on the value of environment variable OMP_SCHEDULE.

e.g. setenv OMP_SCHEDULE “static, 1” would set the run-time schedule to be an interleaved allocation.

Page 30: Chapter 17

More General Data Parallelism

Page 31: Chapter 17
Page 32: Chapter 17

parallel Pragama

Page 33: Chapter 17

for Pragama

Page 34: Chapter 17

single Pragama

Page 35: Chapter 17

nowait Clause

Page 36: Chapter 17

Functional Parallelism

e.g.

v = alpha(); w = beta(); x = gamma (v, w); y = delta (); printf (“%6.2f\n”, epsilon(x,y));

Page 37: Chapter 17
Page 38: Chapter 17

#pragma omp parallel sections {#pragma omp section /* This pragma optional */ v = alpha();#pragma omp section w = beta();#pragma omp section y = delta ();} x = gamma (v, w);printf (“%6.2f\n”, epsilon(x,y));

Page 39: Chapter 17

#pragma omp parallel { #pragma omp sections { #pragma omp section v = alpha(); #pragma omp section w = beta(); } #pragma omp sections { #pragma omp section x = gamma(v,w); #pragma omp section y = delta(); }}printf (“%6.2f\n”, epsilon(x,y));

Page 40: Chapter 17
Page 41: Chapter 17

Example : Hello world

Write a multithreaded program that prints “hello world”.

#include “omp.h”

void main(){

#pragma omp parallel

{

int ID = 0;

printf(“ hello(%d) ”, ID); printf(“ world(%d) \n”, ID); }}

Page 42: Chapter 17

Example : Hello world

Write a multithreaded program where each thread prints “hello world”.

#include “omp.h”void main(){

#pragma omp parallel {

int ID = omp_get_thread_num(); printf(“ hello(%d) ”, ID); printf(“ world(%d) \n”, ID); }}

OpenMP include fileOpenMP include file

Parallel region with default number of threads

Parallel region with default number of threads

End of the Parallel region

End of the Parallel region

Runtime library function to return a thread ID.

Runtime library function to return a thread ID.

Sample Output:

hello(1) hello(0) world(1)

world(0)

hello (3) hello(2) world(3)

world(2)

Page 43: Chapter 17

Experimentation

Write a function that using sequential implementation of the matrix times vector product in C.

Write a function that using OpenMP /MPI implementation of the matrix times vector product in C.

Comparing the performance of the two functions.