Chap. 6 Part 1 - University of Guelphgardnerw/courses/cis3090/lectures/ch6-1.pdf · 2016-10-24 ·...
Transcript of Chap. 6 Part 1 - University of Guelphgardnerw/courses/cis3090/lectures/ch6-1.pdf · 2016-10-24 ·...
Chap 6: specific programming techniques
Languages and libraries Authors blur the distinction
Languages: access parallel programming features explicitly or implicitly (under hood) by simply coding in the language
Java, Java threads, Go, Scala, Julia …
Vs. C/C++ no HLL support for threads (till C++11)
CUDA, OpenCL for GPU co-processor
Most languages are inherently serial
Parallel prog. hasn’t been important before recently!
Introduces many complications Fall 2016 CIS*3090 Parallel Programming 2
Libraries
Making inherently-serial languages support parallel programming via calls to a library API
pthreads.h, good example for C/C++
pilot.h, ditto for C & Fortran
Other Pilot ports: C++, Python
mpi.h, basis of Pilot
Lots of parallel languages, libraries exist
Few have caught on
Fall 2016 CIS*3090 Parallel Programming 3
Starting with pthreads
Rationale
Already exposed in OS course
Thread definition similar to Pilot’s process definition, via work function
All communication via shared memory
Nothing stopping you using msg. passing in shared mem as sound IPC technique!
QNX (bought by RIM) has send/receive/reply
Fall 2016 CIS*3090 Parallel Programming 4
Compare/contrast pthreads API with Pilot’s
pthread_create
Like PI_CreateProcess + PI_StartAll
Thread is candidate for execution immediately!
pthread_t is handle similar to PI_PROCESS*
Like Pilot, one thread function can serve for multiple threads
Distinguish instances via void* arg
Like Pilot’s index & void* args
1st call to pthread_create converts main() from process into a thread itself (PI_MAIN)
Fall 2016 CIS*3090 Parallel Programming 5
Bound vs. unbound threads May need to set thread attributes
“Bound” each thread gets own core
(provided #threads ≤ #cores)
This is Pthreads’ “system” contention scope: every thread is an equal contender for CPU
“Unbound” = “process” contention scope
Process’s threads treated as a group (less CPU)
Default may be OS-specific:
1 core:1 thread; 1:N; N:M
Can also specify “scheduling policy” like FIFO or RR, and set thread priority
Fall 2016 CIS*3090 Parallel Programming 6
pthread_join ~ PI_StopMain
Wait for thread exit and reap its status
Done by master, or any thread with handle on “joinee” thread
Status is specified as a void*
Can pass a value cast to (void*)
If really passing pointer, make sure doesn’t go out of scope when thread exits!
Static storage address will still be valid
Pointer to stack variable dumb!
PI_StopMain does barrier with all processes
Fall 2016 CIS*3090 Parallel Programming 7
“Detached” threads
“Detached” thread attribute opposite of “joinable”
Left to finish independently
Can’t return a status
Also, pthread_detach() changes joinable thread to detached
Fall 2016 CIS*3090 Parallel Programming 8
How “main” thread ends
Can return or call exit()
Terminates process and any remaining running threads (including detached)
Can call pthread_exit()
Leaves other threads running
Running unjoined and detached threads keep whole process alive
Likely not what you wanted
Likely not what you wanted!
Fall 2016 CIS*3090 Parallel Programming 9
How spawned threads end
Normal way: work function returns
return(status) or call pthread_exit()
Waits to join if joinable, or dies if detached
Can also be cancelled by other thread
Tricky, could leave mess (e.g., locked mutexes)
Complex, could be inside a system call (e.g., I/O), which may/not be “cancellation point”
Possible to define “cleanup function” to be called upon cancellation
Best to stay away from this!
Fall 2016 CIS*3090 Parallel Programming 10
Inter-thread synchronization
Mutex: lock, unlock, trylock
Should initialize to “unlocked” via PTHREAD_MUTEX_INITIALIZER
No “fairness” for multiple waiters
Not necessarily FIFO queue (organize it yourself)
(Counting) Semaphore:
Need additional #include <semaphore.h>
init(value), wait, post (aka signal), getvalue, trywait
Fall 2016 CIS*3090 Parallel Programming 11
Condition variables
Solves problem of holding onto mutex while waiting for (logical) condition to occur
I.e., waiting inside a critical section
Associated with a mutex
wait, timedwait, signal, broadcast (=signal all waiters)
Fall 2016 CIS*3090 Parallel Programming 12
Classic producer/consumer
How can producer wait for room to open up in “full” buffer without releasing the buffer’s mutex lock?
Prevents consumer from removing an item!
One solution is for producer to give up the lock and check back “after awhile”
But we don’t want it busy-waiting, nor to keep waking up on a timer uselessly when the condition hasn’t changed
Fall 2016 CIS*3090 Parallel Programming 13
The magic of cond_wait
Waiters for condition to change call cond_wait covertly gives up
associated mutex before blocking
When returns, mutex already reacquired!
Important: mutex lock is associated with some shared data structure (e.g., buffer)
Whoever is accessing data structure needs to use SAME mutex
Can be multiple condition variables associated with same mutex
Fall 2016 CIS*3090 Parallel Programming 14
Condition variable: Basic discipline
Waiter first acquires associated mutex…
finds logical condition false, so calls cond_wait() to wait for condition to become true blocks
Any party that wants to signal that condition is now (potentially) fulfilled…
calls cond_signal() wakes up waiters
Signaler only needs to acquire mutex if messing with associated data structure
Fall 2016 CIS*3090 Parallel Programming 15
Main opportunity for failure
Returning from cond_wait does NOT necessarily mean that the condition is true despite having been signaled!!!
Library is allowed to wake up multiple waiters from one cond_signal (sorry to say)
All have to contend for reacquiring mutex
Only one at a time will succeed, and return from its cond_wait call
Another one will not return till earlier one releases the mutex, by which time condition may have changed
Fall 2016 CIS*3090 Parallel Programming 16
What waiter must do
So, as a cond_wait caller, upon waking:
You DO KNOW that you have exclusive use of the associated data structure
But you CAN’T ASSUME that the cond_signaled condition is (still) true
Ergo, MUST recheck condition
Another woken waiter may have changed condition (e.g, re-emptied or re-filled the buffer)
Those who assume wakeup from cond_wait means condition good to go have buggy code!
Fall 2016 CIS*3090 Parallel Programming 17
Unsafe condition variable use
Fig 6.4: circular buffer, put/get indexes Shared buffer protected by mutex lock
C.v. nonempty for producer to signal
Inserting item and signaling non-empty condition must be within same critical section!
If not, signal could be missed between waiter finding buffer is empty and waiting for it to fill
Consider both 1) modifying the buffer and 2) signaling the change as part of the same “locked” transaction
Fall 2016 CIS*3090 Parallel Programming 18
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 6-19
Figure 6.4 Example of why a signaling
thread needs to be protected by a mutex.
pthread_mutex_lock(&lock);
&
1. Consumer (right column) locks mutex
2. Consumer checks if buffer empty (put==get), but before it can call
cond_wait…
3. Producer (left column) inserts item and calls cond_signal
4. The signal is lost, because no one is waiting yet!
5. Now when consumer calls cond_wait, it will not wake up
Solution is for producer to lock mutex around (before/after) insert and
cond_signal. This will force the insert to run after cond_wait releases the
mutex (arrow), then the signal will wakeup the waiting consumer.
Fig 6.5 needs repairs
Meant to show that…
3 critical sections (C/S) pertaining to same buffer should use same mutex, then
every possible execution sequence safe
Even if multiple consumers, this works
Test of buffer-empty is in while loop
Re-executes test anytime cond_wait returns
So test and removal occur w/in same C/S
Fall 2016 CIS*3090 Parallel Programming 20
Fall 2016 CIS*3090 Parallel Programming 21
lock(&mutex)
/* insert item */
pthread_cond_signal(&nonempty)
unlock(&mutex)
Fig 6.3’s insert( )
Producer’s critical section A
lock(&mutex)
while(put==get)
pthread_cond_wait(&nonempty,&mutex)
Fig 6.3’s remove( ) Consumer’s critical section B
(waiting for new items gives up mutex)
/* remove item */
unlock(&mutex)
Consumer’s critical section C
Signaling thread Waiting thread
CASE 1: A B C Producer followed by consumer
lock(&mutex)
/* insert item */
pthread_cond_signal(&nonempty)
unlock(&mutex)
no one waiting for signal, but doesn’t matter
since consumer will find buffer non-empty
lock(&mutex)
while(put==get) /* false has data */
pthread_cond_wait(&nonempty,&mutex)
/* remove item */
unlock(&mutex)
Fall 2016 CIS*3090 Parallel Programming 22
CASE 2: B A C Consumer finds empty buffer lock(&mutex)
while(put==get) /* true no data */
pthread_cond_wait(&nonempty,&mutex)
lock(&mutex)
/* insert item */
pthread_cond_signal(&nonempty)
unlock(&mutex)
consumer is waiting and gets wakeup
/* remove item */
unlock(&mutex)
CASE 3: B C A Consumer followed by producer lock(&mutex)
while(put==get) /* false has data */
pthread_cond_wait(&nonempty,&mutex) /* remove item */
unlock(&mutex)
lock(&mutex)
/* insert item */
pthread_cond_signal(&nonempty)
unlock(&mutex)
no one waiting for signal, but doesn’t matter
since consumer will (later) find buffer non-empty
Multiple Condition Vars. (p159)
See any problem with this?
Fall 2016 CIS*3090 Parallel Programming 23
EatJuicyFruit()
{
pthread_mutex_lock(&lock);
while( apples==0 || oranges == 0 )
{
pthread_cond_wait( &more_apples, &lock );
pthread_cond_wait( &more_oranges, &lock );
}
/* CRITICAL SECTION: eat both an apple and an orange */
pthread_mutex_unlock(&lock);
}
Solution
Involves proper use of cond_wait
Shows how tricky pthreads code is to write correctly!
Pilot much less opportunity for deadlocks by comparison
Also: message-passing handles both communication and synchronization!
Pthreads API only does inter-thread sync (you’re using global variables for comm.)
Fall 2016 CIS*3090 Parallel Programming 24
Thread-specific data (TSD) (lots of typos)
“Variable that is global in scope (to all functions) but having different values for each thread”
Identified by “key” (key_create func)
Each TS variable needs its own key
Use “setspecific” and “getspecific” with key:value pair
Benefit: lower-level funcs can access these values without passing them down as args
Fall 2016 CIS*3090 Parallel Programming 25
Drawbacks to TSD
Not terribly efficient since accessed via function call
“Don’t place in inner loops”
Can set up per-key destructor function
Useful for OOP (C++)
When thread exits, automatically called
C++11 has TSD
Fall 2016 CIS*3090 Parallel Programming 26
Safety issues
Deadlocks (familiar from CIS*3110)
Lock hierarchies:
When thread needs to acquire more than one mutex at a time
Make rule that they be acquired in consistent order (e.g. alphabetical by variable name)
Prevents circular wait
Unfortunately no easy way to enforce!
Fall 2016 CIS*3090 Parallel Programming 27
Monitors
Very “OO” encapsulates shared data
with methods that manipulate it
Methods take care of acquiring needed mutexes, deal with cond. vars.
Prevents programmer logic errors leading to deadlock or violation of C/S by hiding mutex/cv’s
Not provided directly in pthreads.h
Build yourself using cond. vars. (pattern in book)
Fall 2016 CIS*3090 Parallel Programming 28