Post on 05-Jan-2016
description
CS2403 Programming Languages
Concurrency
Chung-Ta KingDepartment of Computer ScienceNational Tsing Hua University
(Slides are adopted from Concepts of Programming Languages, R.W. Sebesta)
2
Outline
Parallel architecture and programming Language supports for concurrency
Controlling concurrent tasksSharing dataSynchronizing tasks
3
Sequential Computing
von Neumann arch. with Program Counter (PC) dictates sequential execution
Traditional programming thus follows a single thread of controlThe sequence of program
points reached as control flows through the program
Program counter(Introduction to Parallel Computing, Blaise Barney)
4
Sequential Programming Dominates
Sequential programming has dominated throughout computing history
Why?Why is there no need to change programming
style?
5
2 Factors Help to Maintain Perf.
IC technology: ever shrinking feature sizeMoore’s law, faster switching, more functionalities
Architectural innovations to remove bottlenecks in von Neumann architecture Memory hierarchy for reducing memory latency:
registers, caches, scratchpad memoryHide or tolerate memory latency: multithreading,
prefetching, predication, speculationExecuting multiple instructions in parallel:
pipelining, multiple issue (in-/out-of-order, VLIW), SIMD multimedia extensions (inst.-level parallelism, ILP)
(Prof. Mary Hall, Univ. of Utah)
6
End of Sequential Programming?
Infeasible for continuing improving performance of uniprocessorsPower, clocking, ...
Multicore architecture prevails (homogeneous or heterogeneous)Achieve performance gains with simpler
processors Sequential programming still alive!
Why?Throughput versus execution time
Can we live with sequential prog. forever?
7
Parallel Programming
A programming style that specify concurrency (control structure) & interaction (communication structure) between concurrent subtasksStill in imperative language style
Concurrency can be expressed at various levels of granularityMachine instruction level, high-level language
statement level, unit level, program level Different models assume different
architectural supportLook at parallel architectures first
(Ananth Grama, Purdue Univ.)
8
An Abstract Parallel Architecture
How is parallelism managed? Where is the memory physically located? What is the connectivity of the network?
(Prof. Mary Hall, Univ. of Utah)
9
Flynn’s Taxonomy of Parallel Arch.
Distinguishes parallel architecture by instruction and data streamsSISD: classical uniprocessor architecture
S I S D Single Instruction,
Single Data
S I M D Single Instruction,
Multiple Data
M I S D Multiple Instruction,
Single Data
M I M D Multiple Instruction,
Multiple Data
(Introduction to Parallel Computing, Blaise Barney)
10
Parallel Control Mechanisms
(Prof. Mary Hall, Univ. of Utah)
11
2 Classes of Parallel Architecture
Shared memory multiprocessor architecturesMultiple processors can operate independently
but share the same memory systemShare a global address space where each
processor can access every memory locationChanges in a memory location
effected by one processor are visible to all other processors like a bulletin board
(Introduction to Parallel Computing, Blaise Barney; Prof. Mary Hall, Univ. of Utah)
12
2 Classes of Parallel Architecture
Distributed memory architecturesProcessing units (PEs) connected by an
interconnectEach PE has its own distinct address space
without a global address space, and they explicitly communicate to exchange data
Ex.: PC clusters of connected by commodity Ethernet
(Introduction to Parallel Computing, Blaise Barney; Prof. Mary Hall, Univ. of Utah)
13
Shared Memory Programming
Often as a collection of threads of controlEach thread has private data, e.g., local stack,
and a set of shared variables, e.g., global heapThreads communicate implicitly by writing and
reading shared variablesThreads coordinate through locks and barriers
implemented using shared variables
(Prof. Mary Hall, Univ. of Utah)
14
Distributed Memory Programming
Organized as named processesA process is a thread of control plus local address
space -- NO shared dataA process cannot see the memory contents of other
processes, nor can it address and access themLogically shared data is partitioned over processesProcesses communicate by explicit send/receive.
i.e., asking the destination process to access its local data on behalf of the requesting process
Coordination is implicit in communication events blocking/non-blocking send and receive
(Prof. Mary Hall, Univ. of Utah)
15
Distributed Memory Programming
Private memory looks like mailbox
(Prof. Mary Hall, Univ. of Utah)
16
Specifying Concurrency
What language supports are needed for parallel programming?
Specifying (parallel) control flowsHow to create, start, suspend, resume, stop
processes/threads? How to let one process/thread explicitly wait for events or another process/thread?
Specifying data flows among parallel flowsHow to pass a data generated by one
process/thread to another process/thread?How to let multiple process/thread access
common resources, e.g., counter, with conflicts
17
Specifying Concurrency
Many parallel programming systems provide libraries and perhaps compiler pre-processors to extend a traditional imperative language, such as C, for parallel programmingExamples: Pthread, OpenMP, MPI,...
Some languages have parallel constructs built directly into the language, e.g., Java, C#
So far, the library approach works fine
18
Shared Memory Prog. with Threads
Several thread libraries: PThreads: the POSIX threading interface
POSIX: Portable Operating System Interface for UNIX
Interface to OS utilitiesSystem calls to create and synchronize threads
OpenMP is newer standardAllow a programmer to separate a program into
serial regions and parallel regionsProvide synchronization constructsCompiler generates thread program & synch.Extensions to Fortran, C, C++ mainly by directives
(Prof. Mary Hall, Univ. of Utah)
19
Thread Basics
A thread is a program unit that can be in concurrent execution with other program units
Threads differ from ordinary subprograms:When a program unit starts the execution of a
thread, it is not necessarily suspendedWhen a thread’s execution is completed,
control may not return to the callerAll threads run in the same address space but
have own runtime stacks
20
Message Passing Prog. with MPI
MPI defines a standard library for message-passing that can be used to develop portable message-passing programs using C or FortranBased on Single Program, Multiple Data (SPMD)All communication, synchronization require
subroutine calls no shared variablesProgram runs on a single processor just like any
uniprocessor program, except for calls to message passing library
It is possible to write fully-functional message-passing programs by using only six routines
(Prof. Mary Hall, Univ. of Utah; Prof. Ananth Grama, Purdue Univ. )
21
Message Passing Basics
The computing systems consists of p processes, each with its own exclusive address spaceEach data element must belong to one of the
partitions of the space; hence, data must be explicitly partitioned and placed
All interactions (read-only or read/write) require cooperation of two processes - the process that has the data and one that wants to access the data
All processes execute asynchronously unless they interact through send/receive synchronizations
(Prof. Ananth Grama, Purdue Univ. )
22
Controlling Concurrent Tasks
Pthreads:Program starts with a single master thread, from
which other threads are created errcode = pthread_create(&thread_id,
&thread_attribute, &thread_fun, &fun_arg);
Each thread executes a specific function, thread_fun(), representing thread’s computation
All threads execute in parallel Function pthread_join() suspends execution of
calling thread until the target thread terminates
(Prof. Mary Hall, Univ. of Utah)
23
Pthreads “Hello World!”
#include <pthread.h>void *thread(void *vargp);int main() { pthread_t tid; pthread_create(&tid, NULL, thread, NULL); pthread_join(tid, NULL); pthread_exit((void *)NULL);}void *thread(void *vargp){ printf("Hello World from thread!\n"); pthread_exit((void *)NULL);}
(http://www.cs.binghamton.edu/~guydosh/cs350/hello.c)
24
Controlling Concurrent Tasks (cont.)
OpenMP:Begin execution as a single process and fork
multiple threads to work on parallel blocks of code single program multiple data
Parallel constructs are specified using Pragmas
(Prof. Mary Hall, Univ. of Utah)
25
OpenMP Pragma
All pragmas begin: #pragma Compiler calculates loop bounds for each
thread and manages data partitioningSynchronization also automatic (barrier)
(Prof. Mary Hall, Univ. of Utah)
26
OpenMP “Hello World!”
#include <omp.h>int main (int argc, char *argv[]) { int th_id, nthreads; #pragma omp parallel private(th_id) { th_id = omp_get_thread_num(); printf("Hello World: %d\n", th_id); #pragma omp barrier if ( th_id == 0 ) { nthreads = omp_get_num_threads(); printf("%d threads\n",nthreads); } } return EXIT_SUCCESS;}
(http://en.wikipedia.org/wiki/OpenMP#Hello_World)
27
Controlling Concurrent Tasks (cont.)
Java:The concurrent units in Java are methods
named runA run method code can be in concurrent
execution with other such methodsThe process in which the run methods execute
is called a threadClass myThread extends Thread {public void run () {...}
}...Thread myTh = new MyThread ();myTh.start();
28
Controlling Concurrent Tasks (cont.)
Java Thread class has several methods to control the execution of threadsThe yield is a request from the running thread
to voluntarily surrender the processorThe sleep method can be used by the caller of
the method to block the threadThe join method is used to force a method to
delay its execution until the run method of another thread has completed its execution
29
Controlling Concurrent Tasks (cont.)
Java thread priority:A thread’s default priority is the same as the
thread that create itIf main creates a thread, its default priority is NORM_PRIORITY
Threads defined two other priority constants, MAX_PRIORITY and MIN_PRIORITY
The priority of a thread can be changed with the methods setPriority
30
Controlling Concurrent Tasks (cont.)
MPI:Programmer writes the code for a single
process and the compiler includes necessary librariesmpicc -g -Wall -o mpi_hello mpi_hello.c
The execution environment starts parallel processesmpiexec -n 4 ./mpi_hello
(Prof. Mary Hall, Univ. of Utah)
31
MPI “Hello World!”
#include "mpi.h"int main(int argc, char *argv[]) { int rank, size; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); printf(”Hello World from process %d of
%d\n", rank, size); MPI_Finalize(); return 0;}
(Prof. Mary Hall, Univ. of Utah)
32
Sharing Data
Pthreads:Variables declared outside of main are sharedObject allocated on the heap may be shared (if
pointer is passed)Variables on the stack are private: passing pointer to
these around to other threads can cause problemsShared variables can be read and written directly by
all threads need synchronization to prevent racesSynchronization primitives, e.g., semaphores, locks,
mutex, barriers, are used to sequence the executions of the threads to indirectly sequence the data passed through shared variables
(Prof. Mary Hall, Univ. of Utah)
33
Sharing Data (cont.)
OpenMP:shared variables are shared; default is sharedprivate variables are privateLoop index is private int bigdata[1024]; void* foo(void* bar) { int tid; #pragma omp parallel \ shared (bigdata) private (tid) { /* Calc. here */ } }
(Prof. Mary Hall, Univ. of Utah)
34
Sharing Data (cont.)
MPI:int main( int argc, char *argv[]) { int rank, buf; MPI_Status status; MPI_Init(&argv, &argc); MPI_Comm_rank(MPI_COMM_WORLD, &rank); if (rank == 0) { buf = 123456; MPI_Send(&buf, 1, MPI_INT, 1, 0,
MPI_COMM_WORLD); } else if (rank == 1) { MPI_Recv(&buf, 1, MPI_INT, 0, 0,
MPI_COMM_WORLD, &status);} MPI_Finalize();}
(Prof. Mary Hall, Univ. of Utah)
35
Synchronizing Tasks
A mechanism that controls the order in which tasks execute
Two kinds of synchronizationCooperation: one task waits for another, e.g., for
passing datatask 1 task 2
a = ... ... = ... a ...Competition: tasks compete for exclusive use of
resource without specific ordertask 1 task 2
sum += local_sum sum += local_sum
36
Synchronizing Tasks (cont.)
Pthreads:Provide various synchronization primitives,
e.g., mutex, semaphore, barrierMutex: protects critical sections -- segments of
code that must be executed by one thread at any timeProtect code to indirectly protect shared data
Semaphore: synchronizes between two threads using sem_post() and sem_wait()
Barrier: synchronizes threads to reach the same point in code before going any further
37
Pthreads Mutex Example
pthread_mutex_t sum_lock;int sum; main() { ... pthread_mutex_init(&sum_lock, NULL); ... } void *find_min(void *list_ptr) {
int my_sum; pthread_mutex_lock(&sum_lock); sum += my_sum; pthread_mutex_unlock(&sum_lock);
}
38
Synchronizing Tasks (cont.)
OpenMP:OpenMP has reduce operationsum = 0; #pragma omp parallel for reduction(+:sum)for (i=0; i < 100; i++) { sum += array[i]; }OpenMP also has critical directive that is
executed by all threads, but restricted to only one thread at a time
#pragma omp critical [( name )] new-line sum = sum + 1;
(Prof. Mary Hall, Univ. of Utah)
39
Synchronizing Tasks (cont.)
Java:A method that includes the synchronized
modifier disallows any other method from running on the object while it is in execution
public synchronized void deposit(int i) {…}
public synchronized int fetch() {…}The above two methods are synchronized
which prevents them from interfering with each other
40
Synchronizing Tasks (cont.)
Java:Cooperation synchronization is achieved via wait, notify, and notifyAll methods
All methods are defined in Object, which is the root class in Java, so all objects inherit them
The wait method must be called in a loopThe notify method is called to tell one waiting
thread that the event it was waiting has happened
The notifyAll method awakens all of the threads on the object’s wait list
41
Synchronizing Tasks (cont.)
MPI:Use send/receive to complete task synchronizations,
but semantics of send/receive have to be specializedNon-blocking send/receive:
Non-blocking send/receive: send() and receive() calls will return no matter whether data has arrived
Blocking send/receive:Unbuffered blocking send() does not return until
matching receive() is encountered at receiving process Buffered blocking send() will return after the sender has
copied the data into the designated bufferBlocking receive() forces the receiving process to wait
(Prof. Ananth Grama, Purdue Univ. )
42
Unbuffered Blocking
(Prof. Ananth Grama, Purdue Univ. )
43
Buffered Blocking
(Prof. Ananth Grama, Purdue Univ. )
44
Summary
Concurrent execution can be at the instruction, statement, subprogram, or program level
Two fundamental programming style: shared variables and message passing
Programming languages must provide supports for specifying control and data flows