Chap. 6 Part 1 - University of Guelphgardnerw/courses/cis3090/lectures/ch6-1.pdf · 2016-10-24 ·...

29
Chap. 6 Part 1 CIS*3090 Fall 2016 Fall 2016 CIS*3090 Parallel Programming 1

Transcript of Chap. 6 Part 1 - University of Guelphgardnerw/courses/cis3090/lectures/ch6-1.pdf · 2016-10-24 ·...

Chap. 6 Part 1

CIS*3090 Fall 2016

Fall 2016 CIS*3090 Parallel Programming 1

Chap 6: specific programming techniques

Languages and libraries Authors blur the distinction

Languages: access parallel programming features explicitly or implicitly (under hood) by simply coding in the language

Java, Java threads, Go, Scala, Julia …

Vs. C/C++ no HLL support for threads (till C++11)

CUDA, OpenCL for GPU co-processor

Most languages are inherently serial

Parallel prog. hasn’t been important before recently!

Introduces many complications Fall 2016 CIS*3090 Parallel Programming 2

Libraries

Making inherently-serial languages support parallel programming via calls to a library API

pthreads.h, good example for C/C++

pilot.h, ditto for C & Fortran

Other Pilot ports: C++, Python

mpi.h, basis of Pilot

Lots of parallel languages, libraries exist

Few have caught on

Fall 2016 CIS*3090 Parallel Programming 3

Starting with pthreads

Rationale

Already exposed in OS course

Thread definition similar to Pilot’s process definition, via work function

All communication via shared memory

Nothing stopping you using msg. passing in shared mem as sound IPC technique!

QNX (bought by RIM) has send/receive/reply

Fall 2016 CIS*3090 Parallel Programming 4

Compare/contrast pthreads API with Pilot’s

pthread_create

Like PI_CreateProcess + PI_StartAll

Thread is candidate for execution immediately!

pthread_t is handle similar to PI_PROCESS*

Like Pilot, one thread function can serve for multiple threads

Distinguish instances via void* arg

Like Pilot’s index & void* args

1st call to pthread_create converts main() from process into a thread itself (PI_MAIN)

Fall 2016 CIS*3090 Parallel Programming 5

Bound vs. unbound threads May need to set thread attributes

“Bound” each thread gets own core

(provided #threads ≤ #cores)

This is Pthreads’ “system” contention scope: every thread is an equal contender for CPU

“Unbound” = “process” contention scope

Process’s threads treated as a group (less CPU)

Default may be OS-specific:

1 core:1 thread; 1:N; N:M

Can also specify “scheduling policy” like FIFO or RR, and set thread priority

Fall 2016 CIS*3090 Parallel Programming 6

pthread_join ~ PI_StopMain

Wait for thread exit and reap its status

Done by master, or any thread with handle on “joinee” thread

Status is specified as a void*

Can pass a value cast to (void*)

If really passing pointer, make sure doesn’t go out of scope when thread exits!

Static storage address will still be valid

Pointer to stack variable dumb!

PI_StopMain does barrier with all processes

Fall 2016 CIS*3090 Parallel Programming 7

“Detached” threads

“Detached” thread attribute opposite of “joinable”

Left to finish independently

Can’t return a status

Also, pthread_detach() changes joinable thread to detached

Fall 2016 CIS*3090 Parallel Programming 8

How “main” thread ends

Can return or call exit()

Terminates process and any remaining running threads (including detached)

Can call pthread_exit()

Leaves other threads running

Running unjoined and detached threads keep whole process alive

Likely not what you wanted

Likely not what you wanted!

Fall 2016 CIS*3090 Parallel Programming 9

How spawned threads end

Normal way: work function returns

return(status) or call pthread_exit()

Waits to join if joinable, or dies if detached

Can also be cancelled by other thread

Tricky, could leave mess (e.g., locked mutexes)

Complex, could be inside a system call (e.g., I/O), which may/not be “cancellation point”

Possible to define “cleanup function” to be called upon cancellation

Best to stay away from this!

Fall 2016 CIS*3090 Parallel Programming 10

Inter-thread synchronization

Mutex: lock, unlock, trylock

Should initialize to “unlocked” via PTHREAD_MUTEX_INITIALIZER

No “fairness” for multiple waiters

Not necessarily FIFO queue (organize it yourself)

(Counting) Semaphore:

Need additional #include <semaphore.h>

init(value), wait, post (aka signal), getvalue, trywait

Fall 2016 CIS*3090 Parallel Programming 11

Condition variables

Solves problem of holding onto mutex while waiting for (logical) condition to occur

I.e., waiting inside a critical section

Associated with a mutex

wait, timedwait, signal, broadcast (=signal all waiters)

Fall 2016 CIS*3090 Parallel Programming 12

Classic producer/consumer

How can producer wait for room to open up in “full” buffer without releasing the buffer’s mutex lock?

Prevents consumer from removing an item!

One solution is for producer to give up the lock and check back “after awhile”

But we don’t want it busy-waiting, nor to keep waking up on a timer uselessly when the condition hasn’t changed

Fall 2016 CIS*3090 Parallel Programming 13

The magic of cond_wait

Waiters for condition to change call cond_wait covertly gives up

associated mutex before blocking

When returns, mutex already reacquired!

Important: mutex lock is associated with some shared data structure (e.g., buffer)

Whoever is accessing data structure needs to use SAME mutex

Can be multiple condition variables associated with same mutex

Fall 2016 CIS*3090 Parallel Programming 14

Condition variable: Basic discipline

Waiter first acquires associated mutex…

finds logical condition false, so calls cond_wait() to wait for condition to become true blocks

Any party that wants to signal that condition is now (potentially) fulfilled…

calls cond_signal() wakes up waiters

Signaler only needs to acquire mutex if messing with associated data structure

Fall 2016 CIS*3090 Parallel Programming 15

Main opportunity for failure

Returning from cond_wait does NOT necessarily mean that the condition is true despite having been signaled!!!

Library is allowed to wake up multiple waiters from one cond_signal (sorry to say)

All have to contend for reacquiring mutex

Only one at a time will succeed, and return from its cond_wait call

Another one will not return till earlier one releases the mutex, by which time condition may have changed

Fall 2016 CIS*3090 Parallel Programming 16

What waiter must do

So, as a cond_wait caller, upon waking:

You DO KNOW that you have exclusive use of the associated data structure

But you CAN’T ASSUME that the cond_signaled condition is (still) true

Ergo, MUST recheck condition

Another woken waiter may have changed condition (e.g, re-emptied or re-filled the buffer)

Those who assume wakeup from cond_wait means condition good to go have buggy code!

Fall 2016 CIS*3090 Parallel Programming 17

Unsafe condition variable use

Fig 6.4: circular buffer, put/get indexes Shared buffer protected by mutex lock

C.v. nonempty for producer to signal

Inserting item and signaling non-empty condition must be within same critical section!

If not, signal could be missed between waiter finding buffer is empty and waiting for it to fill

Consider both 1) modifying the buffer and 2) signaling the change as part of the same “locked” transaction

Fall 2016 CIS*3090 Parallel Programming 18

Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 6-19

Figure 6.4 Example of why a signaling

thread needs to be protected by a mutex.

pthread_mutex_lock(&lock);

&

1. Consumer (right column) locks mutex

2. Consumer checks if buffer empty (put==get), but before it can call

cond_wait…

3. Producer (left column) inserts item and calls cond_signal

4. The signal is lost, because no one is waiting yet!

5. Now when consumer calls cond_wait, it will not wake up

Solution is for producer to lock mutex around (before/after) insert and

cond_signal. This will force the insert to run after cond_wait releases the

mutex (arrow), then the signal will wakeup the waiting consumer.

Fig 6.5 needs repairs

Meant to show that…

3 critical sections (C/S) pertaining to same buffer should use same mutex, then

every possible execution sequence safe

Even if multiple consumers, this works

Test of buffer-empty is in while loop

Re-executes test anytime cond_wait returns

So test and removal occur w/in same C/S

Fall 2016 CIS*3090 Parallel Programming 20

Fall 2016 CIS*3090 Parallel Programming 21

lock(&mutex)

/* insert item */

pthread_cond_signal(&nonempty)

unlock(&mutex)

Fig 6.3’s insert( )

Producer’s critical section A

lock(&mutex)

while(put==get)

pthread_cond_wait(&nonempty,&mutex)

Fig 6.3’s remove( ) Consumer’s critical section B

(waiting for new items gives up mutex)

/* remove item */

unlock(&mutex)

Consumer’s critical section C

Signaling thread Waiting thread

CASE 1: A B C Producer followed by consumer

lock(&mutex)

/* insert item */

pthread_cond_signal(&nonempty)

unlock(&mutex)

no one waiting for signal, but doesn’t matter

since consumer will find buffer non-empty

lock(&mutex)

while(put==get) /* false has data */

pthread_cond_wait(&nonempty,&mutex)

/* remove item */

unlock(&mutex)

Fall 2016 CIS*3090 Parallel Programming 22

CASE 2: B A C Consumer finds empty buffer lock(&mutex)

while(put==get) /* true no data */

pthread_cond_wait(&nonempty,&mutex)

lock(&mutex)

/* insert item */

pthread_cond_signal(&nonempty)

unlock(&mutex)

consumer is waiting and gets wakeup

/* remove item */

unlock(&mutex)

CASE 3: B C A Consumer followed by producer lock(&mutex)

while(put==get) /* false has data */

pthread_cond_wait(&nonempty,&mutex) /* remove item */

unlock(&mutex)

lock(&mutex)

/* insert item */

pthread_cond_signal(&nonempty)

unlock(&mutex)

no one waiting for signal, but doesn’t matter

since consumer will (later) find buffer non-empty

Multiple Condition Vars. (p159)

See any problem with this?

Fall 2016 CIS*3090 Parallel Programming 23

EatJuicyFruit()

{

pthread_mutex_lock(&lock);

while( apples==0 || oranges == 0 )

{

pthread_cond_wait( &more_apples, &lock );

pthread_cond_wait( &more_oranges, &lock );

}

/* CRITICAL SECTION: eat both an apple and an orange */

pthread_mutex_unlock(&lock);

}

Solution

Involves proper use of cond_wait

Shows how tricky pthreads code is to write correctly!

Pilot much less opportunity for deadlocks by comparison

Also: message-passing handles both communication and synchronization!

Pthreads API only does inter-thread sync (you’re using global variables for comm.)

Fall 2016 CIS*3090 Parallel Programming 24

Thread-specific data (TSD) (lots of typos)

“Variable that is global in scope (to all functions) but having different values for each thread”

Identified by “key” (key_create func)

Each TS variable needs its own key

Use “setspecific” and “getspecific” with key:value pair

Benefit: lower-level funcs can access these values without passing them down as args

Fall 2016 CIS*3090 Parallel Programming 25

Drawbacks to TSD

Not terribly efficient since accessed via function call

“Don’t place in inner loops”

Can set up per-key destructor function

Useful for OOP (C++)

When thread exits, automatically called

C++11 has TSD

Fall 2016 CIS*3090 Parallel Programming 26

Safety issues

Deadlocks (familiar from CIS*3110)

Lock hierarchies:

When thread needs to acquire more than one mutex at a time

Make rule that they be acquired in consistent order (e.g. alphabetical by variable name)

Prevents circular wait

Unfortunately no easy way to enforce!

Fall 2016 CIS*3090 Parallel Programming 27

Monitors

Very “OO” encapsulates shared data

with methods that manipulate it

Methods take care of acquiring needed mutexes, deal with cond. vars.

Prevents programmer logic errors leading to deadlock or violation of C/S by hiding mutex/cv’s

Not provided directly in pthreads.h

Build yourself using cond. vars. (pattern in book)

Fall 2016 CIS*3090 Parallel Programming 28

Next time

Look at Successive Over-relaxation case study (p174-187) to prepare

Fall 2016 CIS*3090 Parallel Programming 29