Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution-...

141
Art of Multiprocessor Programming 1 This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 Licen se . You are free: to Share — to copy, distribute and transmit the work to Remix — to adapt the work Under the following conditions: Attribution. You must attribute the work to “The Art of Multiprocessor Programming” (but not in any way that suggests that the authors endorse you or your use of the work). Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same, similar or a compatible license. For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do this is with a link to http://creativecommons.org/licenses/by-sa/3.0/. Any of the above conditions can be waived if you get permission from the copyright holder. Nothing in this license impairs or restricts the author's moral rights.

Transcript of Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution-...

Page 1: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 1

         This work is licensed under a

Creative Commons Attribution-ShareAlike 2.5 License.

• You are free:– to Share — to copy, distribute and transmit the work – to Remix — to adapt the work

• Under the following conditions:– Attribution. You must attribute the work to “The Art of

Multiprocessor Programming” (but not in any way that suggests that the authors endorse you or your use of the work).

– Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same, similar or a compatible license.

• For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do this is with a link to– http://creativecommons.org/licenses/by-sa/3.0/.

• Any of the above conditions can be waived if you get permission from the copyright holder.

• Nothing in this license impairs or restricts the author's moral rights.

Page 2: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Parallel and distributed

programmingAdam Piotrowski

Grzegorz Jabłoński

Lecture I

Page 3: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

3

Parallel and distributed programming

lux.dmcs.pl/padp

Grading policy Lecture slides and material

Additional material

Page 4: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Course Overview

4

Introduction to multiprocessor programming Using OpenMP

Distributed programmingMPI, CORBA programming

Theoretical approche to multiprocessor

programmingFundamentals - Models, algorithms

Real-World programming - ArchitecturesTechniques

Page 5: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Books

5

The Art of Multiprocessor Programming

by Maurice Herlihy and Nir Shavit

Using OpenMPPortable Shared Memory Parallel

Programming

by Barbara Chapman, Gabriele Jost and Ruud van der Pas

Page 6: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 6

Still on some of your desktops: The Uniprocesor

memory

cpu

Page 7: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 7

In the Enterprise: The Shared Memory

Multiprocessor(SMP)

cache

BusBus

shared memory

cachecache

Page 8: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 8

Traditional Scaling Process

User code

TraditionalUniprocessor

Speedup1.8x1.8x

7x7x

3.6x3.6x

Time: Moore’s law

Page 9: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 9

Multicore Scaling Process

User code

Multicore

Speedup 1.8x1.8x

7x7x

3.6x3.6x

Unfortunately, not so simple…

Page 10: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 10

Real-World Scaling Process

1.8x1.8x 2x2x 2.9x2.9x

User code

Multicore

Speedup

Parallelization and Synchronization require great care…

Page 11: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming

11

Sequential Computation

memory

object object

thread

Page 12: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming

12

Concurrent Computation

memory

object object

thre

ads

Page 13: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 13

Asynchrony

• Sudden unpredictable delays– Cache misses (short)– Page faults (long)– Scheduling quantum used up (really

long)

Page 14: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 14

Model Summary

• Multiple threads– Sometimes called processes

• Single shared memory• Objects live in memory• Unpredictable asynchronous

delays

Page 15: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 15

Parallel Primality Testing

• Challenge– Print primes from 1 to 1010

• Given– Ten-processor multiprocessor– One thread per processor

• Goal– Get ten-fold speedup (or close)

Page 16: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 16

Load Balancing

• Split the work evenly• Each thread tests range of 109

…109 10102·1091

P0 P1 P9

Page 17: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming

17

Procedure for Thread i

void primePrint { int i = ThreadID.get(); // IDs in {0..9} for (j = i*109+1, j<(i+1)*109; j++) { if (isPrime(j)) print(j); }}

Page 18: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 18

Issues

• Higher ranges have fewer primes• Yet larger numbers harder to test• Thread workloads

– Uneven– Hard to predict

• Need dynamic load balancing

Page 19: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 19

Issues

• Higher ranges have fewer primes• Yet larger numbers harder to test• Thread workloads

– Uneven– Hard to predict

• Need dynamic load balancingre

jected

Page 20: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming

20

17

18

19

Shared Counter

each thread takes a number

Page 21: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming

21

Procedure for Thread i

int counter = new Counter(1); void primePrint { long j = 0; while (j < 1010) { j = counter.getAndIncrement(); if (isPrime(j)) print(j); }}

Page 22: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming

22

Counter counter = new Counter(1); void primePrint { long j = 0; while (j < 1010) { j = counter.getAndIncrement(); if (isPrime(j)) print(j); }}

Procedure for Thread i

Shared counterobject

Page 23: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming

23

Procedure for Thread i

Counter counter = new Counter(1); void primePrint { long j = 0; while (j < 1010) { j = counter.getAndIncrement(); if (isPrime(j)) print(j); }}

Stop when every value taken

Page 24: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming

24

Counter counter = new Counter(1); void primePrint { long j = 0; while (j < 1010) { j = counter.getAndIncrement(); if (isPrime(j)) print(j); }}

Procedure for Thread i

Increment & return each new

value

Page 25: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming

25

Counter Implementation

public class Counter { private long value;

public long getAndIncrement() { return value++; }}

Page 26: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming

26

Counter Implementation

public class Counter { private long value;

public long getAndIncrement() { return value++; }} OK for single thread,

not for concurrent threads

Page 27: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming

27

What It Means

public class Counter { private long value;

public long getAndIncrement() { return value++; }}

Page 28: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming

28

What It Means

public class Counter { private long value;

public long getAndIncrement() { return value++; }}

temp = value; value = value + 1; return temp;

Page 29: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming

29

time

Not so good…

Value… 1

read 1

read 1

write 2

read 2

write 3

write 2

2 3 2

Page 30: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming

30

Is this problem inherent?

If we could only glue reads and writes…

read

write read

write

Page 31: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming

31

Challenge

public class Counter { private long value;

public long getAndIncrement() { temp = value; value = temp + 1; return temp; }}

Page 32: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming

32

Challenge

public class Counter { private long value;

public long getAndIncrement() { temp = value; value = temp + 1; return temp; }}

Make these steps atomic (indivisible)

Page 33: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming

33

Hardware Solution

public class Counter { private long value;

public long getAndIncrement() { temp = value; value = temp + 1; return temp; }} ReadModifyWrite()

instruction

Page 34: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming

34

An Aside: Java™

public class Counter { private long value;

public long getAndIncrement() { synchronized { temp = value; value = temp + 1; } return temp; }}

Page 35: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming

35

An Aside: Java™

public class Counter { private long value;

public long getAndIncrement() { synchronized { temp = value; value = temp + 1; } return temp; }}

Synchronized block

Page 36: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming

36

An Aside: Java™

public class Counter { private long value;

public long getAndIncrement() { synchronized { temp = value; value = temp + 1; } return temp; }}

Mutual Exclusion

Page 37: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 37

Why do we care?

• We want as much of the code as possible to execute concurrently (in parallel)

• A larger sequential part implies reduced performance

• Amdahl’s law: this relation is not linear…

Page 38: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 38

Amdahl’s Law

OldExecutionTimeNewExecutionTimeSpeedup=

…of computation given n CPUs instead of 1

Page 39: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 39

Amdahl’s Law

p

pn

1

1Speedup=

Page 40: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 40

Amdahl’s Law

p

pn

1

1Speedup=

Parallel fraction

Page 41: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 41

Amdahl’s Law

p

pn

1

1Speedup=

Parallel fraction

Sequential fraction

Page 42: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 42

Amdahl’s Law

p

pn

1

1Speedup=

Parallel fraction

Number of

processors

Sequential fraction

Page 43: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming

43

Example

• Ten processors• 60% concurrent, 40% sequential• How close to 10-fold speedup?

Page 44: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming

44

Example

• Ten processors• 60% concurrent, 40% sequential• How close to 10-fold speedup?

106.0

6.01

1

Speedup=2.17=

Page 45: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming

45

Example

• Ten processors• 80% concurrent, 20% sequential• How close to 10-fold speedup?

Page 46: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming

46

Example

• Ten processors• 80% concurrent, 20% sequential• How close to 10-fold speedup?

108.0

8.01

1

Speedup=3.57=

Page 47: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming

47

Example

• Ten processors• 90% concurrent, 10% sequential• How close to 10-fold speedup?

Page 48: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming

48

Example

• Ten processors• 90% concurrent, 10% sequential• How close to 10-fold speedup?

109.0

9.01

1

Speedup=5.26=

Page 49: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming

49

Example

• Ten processors• 99% concurrent, 1% sequential• How close to 10-fold speedup?

Page 50: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming

50

Example

• Ten processors• 99% concurrent, 1% sequential• How close to 10-fold speedup?

1099.0

99.01

1

Speedup=9.17=

Page 51: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 51

The Moral

• Making good use of our multiple processors (cores) means – Finding ways to effectively parallelize

our code• Minimize sequential parts• Reduce idle time in which threads wait

without

Page 52: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming

52

Implementing a Counter

public class Counter { private long value;

public long getAndIncrement() { temp = value; value = temp + 1; return temp; }}

Make these steps indivisible using

locks

Page 53: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming

53

Locks (Mutual Exclusion)

public interface Lock {

public void lock();

public void unlock();}

Page 54: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming

54

Locks (Mutual Exclusion)

public interface Lock {

public void lock();

public void unlock();}

acquire lock

Page 55: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming

55

Locks (Mutual Exclusion)

public interface Lock {

public void lock();

public void unlock();}

release lock

acquire lock

Page 56: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming

56

Using Locks

public class Counter { private long value; private Lock lock; public long getAndIncrement() { lock.lock(); try { int temp = value; value = value + 1; } finally { lock.unlock(); } return temp; }}

Page 57: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming

57

Using Locks

public class Counter { private long value; private Lock lock; public long getAndIncrement() { lock.lock(); try { int temp = value; value = value + 1; } finally { lock.unlock(); } return temp; }}

acquire Lock

Page 58: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming

58

Using Locks

public class Counter { private long value; private Lock lock; public long getAndIncrement() { lock.lock(); try { int temp = value; value = value + 1; } finally { lock.unlock(); } return temp; }}

Release lock(no matter what)

Page 59: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming

59

Using Locks

public class Counter { private long value; private Lock lock; public long getAndIncrement() { lock.lock(); try { int temp = value; value = value + 1; } finally { lock.unlock(); } return temp; }}

Critical section

Page 60: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 60

Today: Revisit Mutual Exclusion

• Think of performance, not just correctness and progress

• Begin to understand how performance depends on our software properly utilizing the multiprocessor machine’s hardware

• And get to know a collection of locking algorithms…

(1)

Page 61: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 61

What Should you do if you can’t get a lock?

• Keep trying– “spin” or “busy-wait”– Good if delays are short

• Give up the processor– Good if delays are long– Always good on uniprocessor

(1)

Page 62: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 62

What Should you do if you can’t get a lock?

• Keep trying– “spin” or “busy-wait”– Good if delays are short

• Give up the processor– Good if delays are long– Always good on uniprocessor

our focus

Page 63: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 63

Basic Spin-Lock

CS

Resets lock upon exit

spin lock

critical section

...

Page 64: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 64

Basic Spin-Lock

CS

Resets lock upon exit

spin lock

critical section

...

…lock introduces sequential bottleneck

Page 65: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 65

Review: Test-and-Set

• Boolean value• Test-and-set (TAS)

– Swap true with current value– Return value tells if prior value was

true or false

• Can reset just by writing false

Page 66: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 66

Review: Test-and-Set

public class AtomicBoolean { boolean value; public synchronized boolean getAndSet(boolean newValue) {

boolean prior = value; value = newValue; return prior; }}

(5)

Page 67: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 67

Review: Test-and-Set

public class AtomicBoolean { boolean value; public synchronized boolean getAndSet(boolean newValue) {

boolean prior = value; value = newValue; return prior; }}

Swap old and new values

Page 68: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 68

Review: Test-and-Set

AtomicBoolean lock = new AtomicBoolean(false)…boolean prior = lock.getAndSet(true)

Page 69: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 69

Review: Test-and-Set

AtomicBoolean lock = new AtomicBoolean(false)…boolean prior = lock.getAndSet(true)

(5)

Swapping in true is called “test-and-set” or TAS

Page 70: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 70

Test-and-Set Locks

• Locking– Lock is free: value is false– Lock is taken: value is true

• Acquire lock by calling TAS– If result is false, you win– If result is true, you lose

• Release lock by writing false

Page 71: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 71

Test-and-set Lock

class TASlock { AtomicBoolean state = new AtomicBoolean(false);

void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }}

Page 72: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 72

Test-and-set Lock

class TASlock { AtomicBoolean state = new AtomicBoolean(false);

void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }}

Lock state is AtomicBoolean

Page 73: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 73

Test-and-set Lock

class TASlock { AtomicBoolean state = new AtomicBoolean(false);

void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }}

Keep trying until lock acquired

Page 74: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 74

Test-and-set Lock

class TASlock { AtomicBoolean state = new AtomicBoolean(false);

void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }}

Release lock by resetting state to false

Page 75: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 75

Performance

• Experiment– n threads– Increment shared counter 1 million

times

• How long should it take?• How long does it take?

Page 76: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 76

Graph

ideal

tim e

threads

no speedup because of sequential bottleneck

Page 77: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 77

Mystery #1ti

m e

threads

TAS lock

Ideal

(1)

What is going on?

Page 78: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 78

Bus-Based Architectures

Bus

cache

memory

cachecache

Page 79: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 79

Bus-Based Architectures

Bus

cache

memory

cachecache

Random access memory (10s of cycles)

Page 80: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 80

Bus-Based Architectures

cache

memory

cachecache

Shared Bus•Broadcast medium•One broadcaster at a time•Processors and memory all “snoop”

Bus

Page 81: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 81

Bus-Based Architectures

Bus

cache

memory

cachecache

Per-Processor Caches•Small•Fast: 1 or 2 cycles•Address & state information

Page 82: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 82

Jargon Watch

• Cache hit– “I found what I wanted in my cache”– Good Thing™

Page 83: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 83

Jargon Watch

• Cache hit– “I found what I wanted in my cache”– Good Thing™

• Cache miss– “I had to shlep all the way to memory

for that data”– Bad Thing™

Page 84: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 84

Cave Canem

• This model is still a simplification– But not in any essential way– Illustrates basic principles

• Will discuss complexities later

Page 85: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 85

Bus

Processor Issues Load Request

cache

memory

cachecache

data

Page 86: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 86

Bus

Processor Issues Load Request

Bus

cache

memory

cachecache

data

Gimmedata

Page 87: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 87

cache

Bus

Memory Responds

Bus

memory

cachecache

data

Got your data right here data

Page 88: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 88

Bus

Processor Issues Load Request

memory

cachecachedata

data

Gimmedata

Page 89: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 89

Bus

Processor Issues Load Request

Bus

memory

cachecachedata

data

Gimmedata

Page 90: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 90

Bus

Processor Issues Load Request

Bus

memory

cachecachedata

data

I got data

Page 91: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 91

Bus

Other Processor Responds

memory

cachecache

data

I got data

datadata

Bus

Page 92: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 92

Bus

Other Processor Responds

memory

cachecache

data

datadata

Bus

Page 93: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 93

Modify Cached Data

Bus

data

memory

cachedata

data

(1)

Page 94: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 94

Modify Cached Data

Bus

data

memory

cachedata

data

data

(1)

Page 95: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 95

memory

Bus

data

Modify Cached Data

cachedata

data

Page 96: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 96

memory

Bus

data

Modify Cached Data

cache

What’s up with the other copies?

data

data

Page 97: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 97

Cache Coherence

• We have lots of copies of data– Original copy in memory – Cached copies at processors

• Some processor modifies its own copy– What do we do with the others?– How to avoid confusion?

Page 98: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 98

Write-Back Caches

• Accumulate changes in cache• Write back when needed

– Need the cache for something else– Another processor wants it

• On first modification– Invalidate other entries– Requires non-trivial protocol …

Page 99: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 99

Write-Back Caches

• Cache entry has three states– Invalid: contains raw seething bits– Valid: I can read but I can’t write– Dirty: Data has been modified

• Intercept other load requests• Write back to memory before using cache

Page 100: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 100

Bus

Invalidate

memory

cachedatadata

data

Page 101: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 101

Bus

Invalidate

Bus

memory

cachedatadata

data

Mine, all mine!

Page 102: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 102

Bus

Invalidate

Bus

memory

cachedatadata

data

cache

Uh,oh

Page 103: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 103

cache

Bus

Invalidate

memory

cachedata

data

Other caches lose read permission

Page 104: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 104

cache

Bus

Invalidate

memory

cachedata

data

Other caches lose read permission

This cache acquires write permission

Page 105: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 105

cache

Bus

Invalidate

memory

cachedata

data

Memory provides data only if not present in any cache, so no need

to change it now (expensive)

(2)

Page 106: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 106

cache

Bus

Another Processor Asks for Data

memory

cachedata

data

(2)

Bus

Page 107: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 107

cache data

Bus

Owner Responds

memory

cachedata

data

(2)

Bus

Here it is!

Page 108: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 108

Bus

End of the Day …

memory

cachedata

data

(1)

Reading OK, no writing

data data

Page 109: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 109

Simple TASLock

• TAS invalidates cache lines• Spinners

– Miss in cache– Go to bus

• Thread wants to release lock– delayed behind spinners

Page 110: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 110

Test-and-test-and-set

• Wait until lock “looks” free– Spin on local cache– No bus use while lock busy

• Problem: when lock is released– Invalidation storm …

Page 111: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 111

Local Spinning while Lock is Busy

Bus

memory

busybusybusy

busy

Page 112: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 112

Bus

On Release

memory

freeinvalidinvalid

free

Page 113: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 113

On Release

Bus

memory

freeinvalidinvalid

free

miss miss

Everyone misses, rereads

(1)

Page 114: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 114

On Release

Bus

memory

freeinvalidinvalid

free

TAS(…) TAS(…)

Everyone tries TAS

(1)

Page 115: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 115

Mystery Explained

TAS lock

TTAS lock

Ideal

tim e

threads

Better than TAS but still

not as good as ideal

Page 116: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

OpenMP• (Shared Memory, Thread Based Parallelism) OpenMP is a

shared-memory application programming interface (API) whose features, are based on prior efforts to facilitate shared-memory parallel programming.

• (Compiler Directive Based) OpenMP provides directives, library functions and environment variables to create and control the execution of parallel programs.

• (Explicit Parallelism) OpenMP’s directives let the user tell the compiler which instructions to execute in parallel and how to distribute them among the threads that will run the code.

116

Page 117: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

OpenMP exampleint main(){

for (i = 0; i < n; i++) {a [i] = i * 0.5;b [i] = i * 2.0;

}sum = 0;for (i = 1; i <= n; i++ ) {

sum = sum + a[i]*b[i];}printf ("sum = %f", sum);

}

int main(){

for (i = 0; i < n; i++) {a [i] = i * 0.5;b [i] = i * 2.0

}sum = 0; t = 0;#pragma omp parallel private(t) \

shared(sum, a, b, n){ t = 0;

#pragma omp for for (i = 1; i <= n; i++ ) {

t = t + a[i]*b[i];}#pragma omp critical (update_sum)sum += t;

}printf ("sum = %f \n", sum);

}

117

Page 118: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

PARALLEL Region Construct

• A parallel region is a block of code that will be executed by multiple threads. This is the fundamental OpenMP parallel construct.

• Starting from the beginning of this parallel region, the code is duplicated and all threads will execute that code.

• There is an implied barrier at the end of a parallel section. Only the master thread continues execution past this point.

• How Many Threads?– Setting of the NUM_THREADS clause

– Use of the omp_set_num_threads() library function

– Setting of the OMP_NUM_THREADS environment variable

– Implementation default - usually the number of CPUs on a node

• A parallel region must be a structured block that does not span multiple routines or code files

• It is illegal to branch into or out of a parallel region

118

Page 119: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

PARALLEL Region Construct

#pragma omp parallel [clause ...] newline

if (scalar_expression)

private (list)

shared (list)

default (shared | none)

firstprivate (list)

num_threads (integer-expression)

structured_block

119

Page 120: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

PARALLEL Region Construct

#include <omp.h> void main () {

int nthreads, tid; /* Fork a team of threads with each thread having a private tid variable */ #pragma omp parallel private(tid) {

/* Obtain and print thread id */ tid = omp_get_thread_num(); printf("Hello World from thread = %d\n", tid); /* Only master thread does this */ if (tid == 0) {

nthreads = omp_get_num_threads(); printf("Number of threads = %d\n", nthreads);

} } /* All threads join master thread and terminate */

}

120

Page 121: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Work-Sharing Constructsfor Directive

• The for directive specifies that the iterations of the loop immediately following it must be executed in parallel by the team. This assumes a parallel region has already been initiated, otherwise it executes in serial on a single processor.

121

Page 122: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Work-Sharing Constructsfor Directive

#pragma omp for [clause ...] newline

schedule (type [,chunk])

ordered

private (list)

firstprivate (list)

lastprivate (list)

shared (list)

nowait

for_loop

122

Page 123: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Work-Sharing Constructsfor Directive

#include <omp.h> #define CHUNKSIZE 100 #define N 1000 void main () {

int i, chunk; float a[N], b[N], c[N]; /* Some initializations */ for (i=0; i < N; i++) a[i] = b[i] = i * 1.0; chunk = CHUNKSIZE; #pragma omp parallel shared(a,b,c,chunk) private(i) {

#pragma omp for schedule(dynamic,chunk) nowait for (i=0; i < N; i++) c[i] = a[i] + b[i];

} /* end of parallel section */

}

123

Page 124: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Work-Sharing Constructs SECTIONS Directive

• The SECTIONS directive is a non-iterative work-sharing construct. It specifies that the enclosed section(s) of code are to be divided among the threads in the team.

124

Page 125: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Work-Sharing Constructs

SECTIONS Directive#pragma omp sections [clause ...] newline

private (list)

firstprivate (list)

lastprivate (list)

nowait

{

#pragma omp section newline structured_block

#pragma omp section newline structured_block

}

125

Page 126: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Work-Sharing Constructs

SECTIONS Directive#include <omp.h> #define N 1000 void main () {

int i; float a[N], b[N], c[N], d[N]; /* Some initializations */ for (i=0; i < N; i++) a[i] = i * 1.5; b[i] = i + 22.35; #pragma omp parallel shared(a,b,c,d) private(i) {

#pragma omp sections nowait { #pragma omp section

for (i=0; i < N; i++) c[i] = a[i] + b[i]; #pragma omp section for (i=0; i < N; i++) d[i] = a[i] * b[i];

} /* end of sections */

} /* end of parallel section */

}

126

Page 127: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Work-Sharing Constructs

SINGLE Directive• The SINGLE directive specifies that the enclosed code is to be

executed by only one thread in the team.

• May be useful when dealing with sections of code that are not thread safe (such as I/O)

#pragma omp single [clause ...] newline

private (list)

firstprivate (list)

nowait

structured_block

127

Page 128: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Data Scope Attribute Clauses - PRIVATE

• The PRIVATE clause declares variables in its list to be private to each thread.

• A new object of the same type is declared once for each thread in the team

• All references to the original object are replaced with references to the new object

• Variables declared PRIVATE should be assumed to be uninitialized for each thread

128

Page 129: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Data Scope Attribute Clauses - SHARED

• The SHARED clause declares variables in its list to be shared among all threads in the team.

• A shared variable exists in only one memory location and all threads can read or write to that address

• It is the programmer's responsibility to ensure that multiple threads properly access SHARED variables (such as via CRITICAL sections)

129

Page 130: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Data Scope Attribute Clauses - FIRSTPRIVATE• The FIRSTPRIVATE clause combines the

behavior of the PRIVATE clause with automatic initialization of the variables in its list.

• Listed variables are initialized according to the value of their original objects prior to entry into the parallel or work-sharing construct.

130

Page 131: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Data Scope Attribute Clauses - LASTPRIVATE

• The LASTPRIVATE clause combines the behavior of the PRIVATE clause with a copy from the last loop iteration or section to the original variable object.

• The value copied back into the original variable object is obtained from the last (sequentially) iteration or section of the enclosing construct. For example, the team member which executes the final iteration for a DO section, or the team member which does the last SECTION of a SECTIONS context performs the copy with its own values

131

Page 132: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Synchronization Constructs

CRITICAL Directive• The CRITICAL directive specifies a region

of code that must be executed by only one thread at a time.

• If a thread is currently executing inside a CRITICAL region and another thread reaches that CRITICAL region and attempts to execute it, it will block until the first thread exits that CRITICAL region.

132

Page 133: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Synchronization Constructs

CRITICAL Directive#include <omp.h> void main() {

int x; x = 0; #pragma omp parallel shared(x) {

#pragma omp critical x = x + 1;

} /* end of parallel section */

}

133

Page 134: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Synchronization Constructs

• MASTER Directive

– The MASTER directive specifies a region that is to be executed only by the master thread of the team. All other threads on the team skip this section of code

• BARRIER Directive

– The BARRIER directive synchronizes all threads in the team.

– When a BARRIER directive is reached, a thread will wait at that point until all other threads have reached that barrier. All threads then resume executing in parallel the code that follows the barrier.

134

Page 135: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Synchronization Constructs

• ATOMIC Directive– The ATOMIC directive specifies that a

specific memory location must be updated atomically, rather than letting multiple threads attempt to write to it. In essence, this directive provides a mini-CRITICAL section.

– The directive applies only to a single, immediately following statement

135

Page 136: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Find an errorint main(){

for (i = 0; i < n; i++) {a [i] = i * 0.5;b [i] = i * 2.0

}sum = 0; t = 0;#pragma omp parallel shared(sum, a, b, n){

#pragma omp for private(t)for (i = 1; i <= n; i++ ) {

t = t + a[i]*b[i];}#pragma omp critical (update_sum)sum += t;

}printf ("sum = %f \n", sum);

}

136

Page 137: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Find an error

for (i=0; i<n-1; i++)a[i] = a[i] + b[i];

for (i=0; i<n-1; i++)a[i] = a[i+1] + b[i];

137

Page 138: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Find an error

#pragma omp parallel{

int Xlocal = omp_get_thread_num();Xshared = omp_get_thread_num(); printf("Xlocal = %d Xshared = %d\n",Xlocal,Xshared);

}int i, j;#pragma omp parallel forfor (i=0; i<n; i++)

for (j=0; j<m; j++) {a[i][j] = compute(i,j);

}

138

Page 139: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Find an error

void compute(int n){

int i;double h, x, sum;h = 1.0/(double) n;sum = 0.0;#pragma omp for reduction(+:sum) shared(h)for (i=1; i <= n; i++) {

x = h * ((double)i - 0.5);sum += (1.0 / (1.0 + x*x));

}pi = h * sum;

}

139

Page 140: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Find an error

void main (){.............

#pragma omp parallel for private(i,a,b)for (i=0; i<n; i++){

b++;a = b+i;

} /*-- End of parallel for --*/c = a + b;.............

}

140

Page 141: Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.

Art of Multiprocessor Programming 141

         This work is licensed under a

Creative Commons Attribution-ShareAlike 2.5 License.

• You are free:– to Share — to copy, distribute and transmit the work – to Remix — to adapt the work

• Under the following conditions:– Attribution. You must attribute the work to “The Art of

Multiprocessor Programming” (but not in any way that suggests that the authors endorse you or your use of the work).

– Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same, similar or a compatible license.

• For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do this is with a link to– http://creativecommons.org/licenses/by-sa/3.0/.

• Any of the above conditions can be waived if you get permission from the copyright holder.

• Nothing in this license impairs or restricts the author's moral rights.