Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir...

105
Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Transcript of Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir...

Page 1: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Spin Locks

Companion slides forThe Art of Multiprocessor

Programmingby Maurice Herlihy & Nir Shavit

Page 2: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 2

What Should you do if you can’t get a lock?

• Keep trying– “spin” or “busy-wait”– Good if delays are short

• Give up the processor– Good if delays are long– Always good on uniprocessor

(1)

Page 3: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 3

What Should you do if you can’t get a lock?

• Keep trying– “spin” or “busy-wait”– Good if delays are short

• Give up the processor– Good if delays are long– Always good on uniprocessor

our focus

Page 4: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 4

Basic Spin-Lock

CS

Resets lock upon exit

spin lock

critical section

...

Page 5: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 5

Basic Spin-Lock

CS

Resets lock upon exit

spin lock

critical section

...

…lock introduces sequential bottleneck

Seq Bottleneck no parallelism

Page 6: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 6

Review: Test-and-Set

• Boolean value• Test-and-set (TAS)

– Swap true with current value– Return value tells if prior value was

true or false

• Can reset just by writing false• TAS aka “getAndSet”

Page 7: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 7

Review: Test-and-Set

public class AtomicBoolean { boolean value; public synchronized boolean getAndSet(boolean newValue) {

boolean prior = value; value = newValue; return prior; }}

(5)

Page 8: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 8

Review: Test-and-Set

public class AtomicBoolean { boolean value; public synchronized boolean getAndSet(boolean newValue) {

boolean prior = value; value = newValue; return prior; }}

Packagejava.util.concurrent.atomic

Page 9: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 9

Review: Test-and-Set

public class AtomicBoolean { boolean value; public synchronized boolean getAndSet(boolean newValue) {

boolean prior = value; value = newValue; return prior; }}

Swap old and new values

Page 10: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 10

Review: Test-and-Set

AtomicBoolean lock = new AtomicBoolean(false)…boolean prior = lock.getAndSet(true)

Page 11: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 11

Review: Test-and-Set

AtomicBoolean lock = new AtomicBoolean(false)…boolean prior = lock.getAndSet(true)

(5)

Swapping in true is called “test-and-set” or TAS

Page 12: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 12

Test-and-Set Locks

• Locking– Lock is free: value is false– Lock is taken: value is true

• Acquire lock by calling TAS– If result is false, you win– If result is true, you lose

• Release lock by writing false

Page 13: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 13

Test-and-set Lock

class TASlock { AtomicBoolean state = new AtomicBoolean(false);

void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }}

Page 14: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 14

Test-and-set Lock

class TASlock { AtomicBoolean state = new AtomicBoolean(false);

void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }}

Lock state is AtomicBoolean

Page 15: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 15

Test-and-set Lock

class TASlock { AtomicBoolean state = new AtomicBoolean(false);

void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }}

Keep trying until lock acquired

Page 16: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 16

Test-and-set Lock

class TASlock { AtomicBoolean state = new AtomicBoolean(false);

void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }}

Release lock by resetting state to false

Page 17: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 17

Performance

• Experiment– n threads– Increment shared counter 1 million

times

• How long should it take?• How long does it take?

Page 18: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 18

Graph

ideal

tim e

threads

no speedup because of sequential bottleneck

Page 19: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 19

Mystery #1ti

m e

threads

TAS lock

Ideal

(1)

What is going on?

Page 20: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 20

Test-and-Test-and-Set Locks

• Lurking stage– Wait until lock “looks” free– Spin while read returns true (lock

taken)• Pouncing state

– As soon as lock “looks” available– Read returns false (lock free)– Call TAS to acquire lock– If TAS loses, back to lurking

Page 21: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 21

Test-and-test-and-set Lock

class TTASlock { AtomicBoolean state = new AtomicBoolean(false);

void lock() { while (true) { while (state.get()) {} if (!state.getAndSet(true)) return; }}

Page 22: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 22

Test-and-test-and-set Lock

class TTASlock { AtomicBoolean state = new AtomicBoolean(false);

void lock() { while (true) { while (state.get()) {} if (!state.getAndSet(true)) return; }} Wait until lock looks free

Page 23: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 23

Test-and-test-and-set Lock

class TTASlock { AtomicBoolean state = new AtomicBoolean(false);

void lock() { while (true) { while (state.get()) {} if (!state.getAndSet(true)) return; }}

Then try to acquire it

Page 24: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 24

Mystery #2

TAS lock

TTAS lock

Ideal

tim e

threads

Page 25: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 25

Opinion

• Our memory abstraction is broken• TAS & TTAS methods

– Are provably the same (in our model)

– Except they aren’t (in field tests)

• Need a more detailed model …

Page 26: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 26

Bus-Based Architectures

Bus

cache

memory

cachecache

Page 27: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 27

Bus-Based Architectures

Bus

cache

memory

cachecache

Random access memory (10s of cycles)

Page 28: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 28

Bus-Based Architectures

cache

memory

cachecache

Shared Bus•Broadcast medium•One broadcaster at a time•Processors and memory all “snoop”

Bus

Page 29: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 29

Bus-Based Architectures

Bus

cache

memory

cachecache

Per-Processor Caches•Small•Fast: 1 or 2 cycles•Address & state information

Page 30: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 30

Jargon Watch

• Cache hit– “I found what I wanted in my cache”– Good Thing™

Page 31: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 31

Jargon Watch

• Cache hit– “I found what I wanted in my cache”– Good Thing™

• Cache miss– “I had to shlep all the way to memory

for that data”– Bad Thing™

Page 32: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 32

Cave Canem

• This model is still a simplification– But not in any essential way– Illustrates basic principles

• Will discuss complexities later

Page 33: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 33

Bus

Processor Issues Load Request

cache

memory

cachecache

data

Page 34: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 34

Bus

Processor Issues Load Request

Bus

cache

memory

cachecache

data

Gimmedata

Page 35: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 35

cache

Bus

Memory Responds

Bus

memory

cachecache

data

Got your data right here data

Page 36: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 36

Bus

Processor Issues Load Request

memory

cachecachedata

data

Gimmedata

Page 37: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 37

Bus

Processor Issues Load Request

Bus

memory

cachecachedata

data

Gimmedata

Page 38: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 38

Bus

Processor Issues Load Request

Bus

memory

cachecachedata

data

I got data

Page 39: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 39

Bus

Other Processor Responds

memory

cachecache

data

I got data

datadata

Bus

Page 40: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 40

Bus

Other Processor Responds

memory

cachecache

data

datadata

Bus

Page 41: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 41

Modify Cached Data

Bus

data

memory

cachedata

data

(1)

Page 42: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 42

Modify Cached Data

Bus

data

memory

cachedata

data

data

(1)

Page 43: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 43

memory

Bus

data

Modify Cached Data

cachedata

data

Page 44: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 44

memory

Bus

data

Modify Cached Data

cache

What’s up with the other copies?

data

data

Page 45: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 45

Cache Coherence

• We have lots of copies of data– Original copy in memory – Cached copies at processors

• Some processor modifies its own copy– What do we do with the others?– How to avoid confusion?

Page 46: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 46

Write-Back Caches

• Accumulate changes in cache• Write back when needed

– Need the cache for something else– Another processor wants it

• On first modification– Invalidate other entries– Requires non-trivial protocol …

Page 47: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 47

Write-Back Caches

• Cache entry has three states– Invalid: contains raw seething bits– Valid: I can read but I can’t write– Dirty: Data has been modified

• Intercept other load requests• Write back to memory before using cache

Page 48: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 48

Bus

Invalidate

memory

cachedatadata

data

Page 49: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 49

Bus

Invalidate

Bus

memory

cachedatadata

data

Mine, all mine!

Page 50: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 50

cache

Bus

Invalidate

memory

cachedata

data

Other caches lose read permission

Page 51: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 51

cache

Bus

Invalidate

memory

cachedata

data

Other caches lose read permission

This cache acquires write permission

Page 52: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 52

cache

Bus

Invalidate

memory

cachedata

data

Memory provides data only if not present in any cache, so no need

to change it now (expensive)

(2)

Page 53: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 53

cache

Bus

Another Processor Asks for Data

memory

cachedata

data

(2)

Bus

Page 54: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 54

cache data

Bus

Owner Responds

memory

cachedata

data

(2)

Bus

Here it is!

Page 55: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 55

Bus

End of the Day …

memory

cachedata

data

(1)

Reading OK, no writing

data data

Page 56: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 56

Simple TASLock

• TAS invalidates cache lines• Spinners

– Miss in cache– Go to bus

• Thread wants to release lock– delayed behind spinners

Page 57: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 57

Test-and-test-and-set

• Wait until lock “looks” free– Spin on local cache– No bus use while lock busy

• Problem: when lock is released– Invalidation storm …

Page 58: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 58

Local Spinning while Lock is Busy

Bus

memory

busybusybusy

busy

Page 59: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 59

Bus

On Release

memory

freeinvalidinvalid

free

Page 60: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 60

On Release

Bus

memory

freeinvalidinvalid

free

miss miss

Everyone misses, rereads

(1)

Page 61: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 61

On Release

Bus

memory

freeinvalidinvalid

free

TAS(…) TAS(…)

Everyone tries TAS

(1)

Page 62: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 62

Problems

• Everyone misses– Reads satisfied sequentially

• Everyone does TAS– Invalidates others’ caches

Page 63: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 63

Mystery Explained

TAS lock

TTAS lock

Ideal

tim e

threads

Better than TAS but still

not as good as ideal

Page 64: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 64

Solution: Introduce Delay

spin locktimedr1dr2d

• If the lock looks free• But I fail to get it

• There must be lots of contention• Better to back off than to collide again

Page 65: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 65

Exponential Backoff Lock

public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}}

Page 66: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 66

Exponential Backoff Lock

public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Fix minimum delay

Page 67: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 67

Exponential Backoff Lock

public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Wait until lock looks free

Page 68: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 68

Exponential Backoff Lock

public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} If we win, return

Page 69: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 69

Exponential Backoff Lock

public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}}

Back off for random duration

Page 70: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 70

Exponential Backoff Lock

public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}}

Double max delay, within reason

Page 71: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 71

Spin-Waiting Overhead

TTAS Lock

Backoff lock

tim e

threads

Page 72: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

BUILT-IN FUNCTIONS FOR ATOMIC MEMORY ACCESS

72

Page 73: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Built-in functions for atomic memory access

type __sync_fetch_and_add (type *ptr, type value, ...)type __sync_fetch_and_sub (type *ptr, type value, ...)type __sync_fetch_and_or (type *ptr, type value, ...)type __sync_fetch_and_and (type *ptr, type value, ...)type __sync_fetch_and_xor (type *ptr, type value, ...)type __sync_fetch_and_nand (type *ptr, type value, ...)

{ tmp = *ptr; *ptr op= value; return tmp; } { tmp = *ptr; *ptr = ~tmp & value; return tmp; }

73

Page 74: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Built-in functions for atomic memory access

type __sync_add_and_fetch (type *ptr, type value, ...)type __sync_sub_and_fetch (type *ptr, type value, ...)type __sync_or_and_fetch (type *ptr, type value, ...)type __sync_and_and_fetch (type *ptr, type value, ...)type __sync_xor_and_fetch (type *ptr, type value, ...)type __sync_nand_and_fetch (type *ptr, type value, ...)

{ *ptr op= value; return *ptr; } { *ptr = ~*ptr & value; return *ptr; } // nand

74

Page 75: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Built-in functions for atomic memory access

bool __sync_bool_compare_and_swap (type *ptr, type oldval type newval, ...)type __sync_val_compare_and_swap (type *ptr, type oldval type newval, ...)

type __sync_lock_test_and_set (type *ptr, type value, ...)

75

Page 76: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

OPENMP

76

Page 77: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

OpenMP• (Shared Memory, Thread Based Parallelism) OpenMP is a

shared-memory application programming interface (API) whose features, are based on prior efforts to facilitate shared-memory parallel programming.

• (Compiler Directive Based) OpenMP provides directives, library functions and environment variables to create and control the execution of parallel programs.

• (Explicit Parallelism) OpenMP’s directives let the user tell the compiler which instructions to execute in parallel and how to distribute them among the threads that will run the code.

77

Page 78: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

OpenMP exampleint main(){

for (i = 0; i < 10; i++) {a [i] = i * 0.5;b [i] = i * 2.0;

}sum = 0;for (i = 1; i <= 10; i+

+ ) {sum += a[i]*b[i];

}printf ("sum = %f",

sum);}

78

Page 79: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

OpenMP exampleint main(){

for (i = 0; i < 10; i++) {a [i] = i * 0.5;b [i] = i * 2.0;

}sum = 0;for (i = 1; i <= 10; i+

+ ) {sum += a[i]*b[i];

}printf ("sum = %f",

sum);}

int main(){

for (i = 0; i < 10; i++) {a [i] = i * 0.5;b [i] = i * 2.0

}sum = 0; #pragma omp parallel private(t) shared(sum, a,

b){ t = 0;

#pragma omp for for (i = 1; i <= 10; i++ ) {

t += a[i]*b[i];}#pragma omp critical (update_sum)sum += t;

}printf ("sum = %f \n", sum);

}79

Page 80: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

PARALLEL Region Construct

• A parallel region is a block of code that will be executed by multiple threads. This is the fundamental OpenMP parallel construct.

• Starting from the beginning of this parallel region, the code is duplicated and all threads will execute that code.

• There is an implied barrier at the end of a parallel section. Only the master thread continues execution past this point.

• How Many Threads?– Setting of the NUM_THREADS clause

– Use of the omp_set_num_threads() library function

– Setting of the OMP_NUM_THREADS environment variable

– Implementation default - usually the number of CPUs on a node

• A parallel region must be a structured block that does not span multiple routines or code files

• It is illegal to branch into or out of a parallel region

80

Page 81: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

PARALLEL Region Construct

#pragma omp parallel [clause ...] newline

if (scalar_expression)

private (list)

shared (list)

default (shared | none)

firstprivate (list)

num_threads (integer-expression)

structured_block

81

Page 82: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

PARALLEL Region Construct

#include <omp.h> void main () {

int nthreads, tid; /* Fork a team of threads with each thread having a private tid variable */ #pragma omp parallel private(tid) {

/* Obtain and print thread id */ tid = omp_get_thread_num(); printf("Hello World from thread = %d\n", tid); /* Only master thread does this */ if (tid == 0) {

nthreads = omp_get_num_threads(); printf("Number of threads = %d\n", nthreads);

} } /* All threads join master thread and terminate */

}

82

Page 83: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Work-Sharing Constructsfor Directive

• The for directive specifies that the iterations of the loop immediately following it must be executed in parallel by the team. This assumes a parallel region has already been initiated, otherwise it executes in serial on a single processor.

83

Page 84: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Work-Sharing Constructsfor Directive

#pragma omp for [clause ...] newline

schedule (type [,chunk])

ordered

private (list)

firstprivate (list)

lastprivate (list)

shared (list)

nowait

for_loop

84

Page 85: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Work-Sharing Constructsfor Directive

#include <omp.h> #define CHUNKSIZE 100 #define N 1000 void main () {

int i, chunk; float a[N], b[N], c[N]; /* Some initializations */ for (i=0; i < N; i++) a[i] = b[i] = i * 1.0; chunk = CHUNKSIZE; #pragma omp parallel shared(a,b,c,chunk) private(i) {

#pragma omp for schedule(dynamic,chunk) nowait for (i=0; i < N; i++) c[i] = a[i] + b[i];

} /* end of parallel section */

}

85

Page 86: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Work-Sharing Constructs SECTIONS Directive

• The SECTIONS directive is a non-iterative work-sharing construct. It specifies that the enclosed section(s) of code are to be divided among the threads in the team.

86

Page 87: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Work-Sharing Constructs

SECTIONS Directive#pragma omp sections [clause ...] newline

private (list)

firstprivate (list)

lastprivate (list)

nowait

{

#pragma omp section newline structured_block

#pragma omp section newline structured_block

}

87

Page 88: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Work-Sharing Constructs

SECTIONS Directive#include <omp.h> #define N 1000 void main () {

int i; float a[N], b[N], c[N], d[N]; /* Some initializations */ for (i=0; i < N; i++) a[i] = i * 1.5; b[i] = i + 22.35; #pragma omp parallel shared(a,b,c,d) private(i) {

#pragma omp sections nowait { #pragma omp section

for (i=0; i < N; i++) c[i] = a[i] + b[i]; #pragma omp section for (i=0; i < N; i++) d[i] = a[i] * b[i];

} /* end of sections */

} /* end of parallel section */

}

88

Page 89: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Work-Sharing Constructs

SINGLE Directive• The SINGLE directive specifies that the enclosed code is to be

executed by only one thread in the team.

• May be useful when dealing with sections of code that are not thread safe (such as I/O)

#pragma omp single [clause ...] newline

private (list)

firstprivate (list)

nowait

structured_block

89

Page 90: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Data Scope Attribute Clauses - PRIVATE

• The PRIVATE clause declares variables in its list to be private to each thread.

• A new object of the same type is declared once for each thread in the team

• All references to the original object are replaced with references to the new object

• Variables declared PRIVATE should be assumed to be uninitialized for each thread

90

Page 91: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Data Scope Attribute Clauses - SHARED

• The SHARED clause declares variables in its list to be shared among all threads in the team.

• A shared variable exists in only one memory location and all threads can read or write to that address

• It is the programmer's responsibility to ensure that multiple threads properly access SHARED variables (such as via CRITICAL sections)

91

Page 92: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Data Scope Attribute Clauses - FIRSTPRIVATE• The FIRSTPRIVATE clause combines the

behavior of the PRIVATE clause with automatic initialization of the variables in its list.

• Listed variables are initialized according to the value of their original objects prior to entry into the parallel or work-sharing construct.

92

Page 93: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Data Scope Attribute Clauses - LASTPRIVATE

• The LASTPRIVATE clause combines the behavior of the PRIVATE clause with a copy from the last loop iteration or section to the original variable object.

• The value copied back into the original variable object is obtained from the last (sequentially) iteration or section of the enclosing construct. For example, the team member which executes the final iteration for a DO section, or the team member which does the last SECTION of a SECTIONS context performs the copy with its own values

93

Page 94: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Synchronization Constructs

CRITICAL Directive• The CRITICAL directive specifies a region

of code that must be executed by only one thread at a time.

• If a thread is currently executing inside a CRITICAL region and another thread reaches that CRITICAL region and attempts to execute it, it will block until the first thread exits that CRITICAL region.

94

Page 95: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Synchronization Constructs

CRITICAL Directive#include <omp.h> void main() {

int x; x = 0; #pragma omp parallel shared(x) {

#pragma omp critical x = x + 1;

} /* end of parallel section */

}

95

Page 96: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Synchronization Constructs

• MASTER Directive

– The MASTER directive specifies a region that is to be executed only by the master thread of the team. All other threads on the team skip this section of code

• BARRIER Directive

– The BARRIER directive synchronizes all threads in the team.

– When a BARRIER directive is reached, a thread will wait at that point until all other threads have reached that barrier. All threads then resume executing in parallel the code that follows the barrier.

96

Page 97: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Synchronization Constructs

• ATOMIC Directive– The ATOMIC directive specifies that a

specific memory location must be updated atomically, rather than letting multiple threads attempt to write to it. In essence, this directive provides a mini-CRITICAL section.

– The directive applies only to a single, immediately following statement

97

Page 98: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Find an errorint main(){

for (i = 0; i < n; i++) {a [i] = i * 0.5;b [i] = i * 2.0

}sum = 0; t = 0;#pragma omp parallel shared(sum, a, b, n){

#pragma omp for private(t)for (i = 1; i <= n; i++ ) {

t = t + a[i]*b[i];}#pragma omp critical (update_sum)sum += t;

}printf ("sum = %f \n", sum);

}

98

Page 99: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Find an error

for (i=0; i<n-1; i++)a[i] = a[i] + b[i];

for (i=0; i<n-1; i++)a[i] = a[i+1] + b[i];

99

Page 100: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Find an error

#pragma omp parallel{

int Xlocal = omp_get_thread_num();Xshared = omp_get_thread_num(); printf("Xlocal = %d Xshared = %d\n",Xlocal,Xshared);

}int i, j;#pragma omp parallel forfor (i=0; i<n; i++)

for (j=0; j<m; j++) {a[i][j] = compute(i,j);

}

100

Page 101: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Find an error

void compute(int n){

int i;double h, x, sum;h = 1.0/(double) n;sum = 0.0;#pragma omp for reduction(+:sum) shared(h)for (i=1; i <= n; i++) {

x = h * ((double)i - 0.5);sum += (1.0 / (1.0 + x*x));

}pi = h * sum;

}

101

Page 102: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Find an error

void main (){.............

#pragma omp parallel for private(i,a,b)for (i=0; i<n; i++){

b++;a = b+i;

} /*-- End of parallel for --*/c = a + b;.............

}

102

Page 103: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Find an error

int icount;void lib_func(){

icount++;do_lib_work();

}main (){

#pragma omp parallel{

lib_func();}

}

103

Page 104: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Find an errorwork1(){

#pragma omp barrier}work2(){}main(){

#pragma omp parallel sections{

#pragma omp sectionwork1();

#pragma omp sectionwork2();

}} 104

Page 105: Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 105

         This work is licensed under a

Creative Commons Attribution-ShareAlike 2.5 License.

• You are free:– to Share — to copy, distribute and transmit the work – to Remix — to adapt the work

• Under the following conditions:– Attribution. You must attribute the work to “The Art of

Multiprocessor Programming” (but not in any way that suggests that the authors endorse you or your use of the work).

– Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same, similar or a compatible license.

• For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do this is with a link to– http://creativecommons.org/licenses/by-sa/3.0/.

• Any of the above conditions can be waived if you get permission from the copyright holder.

• Nothing in this license impairs or restricts the author's moral rights.