Post on 16-Dec-2015
Threads
What do we have so far
The basic unit of CPU utilization is a process. To run a program (a sequence of code), create a
process. Processes are well protected from one another. Switching between processes is fairly expensive. Communications between processes are done through
inter-process communication mechanisms. Running process requires the memory management
system. Process I/O is done by the I/O subsystem.
Process for all concurrency needs? Consider developing a PC game:
Different code sequences for different characters (soldiers, cities, airplanes, cannons, user controlled heroes)
Each of the characters is more or less independent. We can create a process for each character. Any drawbacks?
The action of a character usually depends on the game state (locations of other characters).
Implication on the process based implementation of characters?
A lot of context switching.
What do we really need for a PC game? A way to run different sequences of code (threads of
control) for different characters. Processes do this.
A way for different threads of control to share data effectively. Processes are NOT designed to do this.
Protection is not very important, a game is one application anyway. Process is an over-kill.
Switching between threads of control must be as efficient as possible. Context switching is known to be expensive!!!
Theads are created to do all of above.
Thread
Process context Process ID, process group ID, user ID, and group ID Environment Working directory. Program instructions Registers (including PC) Stack Heap File descriptors Signal actions Shared libraries Inter-process communication tools
What is absolutely needed to run a sequence of code (a thread of control)?
Process/Thread context
What are absolutely needed to support a stream of instructions, given the process context? Process ID, process group ID, user ID, and group ID Environment Working directory. Program instructions Registers (including PC) Stack Heap File descriptors Signal actions Shared libraries Inter-process communication tools
Process and Thread
Threads
Threads are executed within a process. Share process context means
easy inter-thread communication. No protection among threads but threads are intended to be
cooperative.
Thread context: PC, registers, a stack, misc. info. Much smaller than the process context!!
Faster context switching Faster thread creation
Threads
Threads are also called light-weight processes. traditional processes are considered heavy-weight. A process can be single-threaded or multithreaded. Threads become so ubiquitous that almost all modern
computing systems use thread as the basic unit of CPU utilization.
More about threads
OS view: A thread is an independent stream of instructions that can be scheduled to run by the OS.
Software developer view: a thread can be considered as a “procedure” that runs independently from the main program. Sequential program: a single stream of instructions in a
program. Multi-threaded program: a program with multiple
streams Multiple threads are needed to use multiple cores/CPUs
Threads… Exist within processes Die if the process dies Use process resources Duplicate only the essential resources for OS to
schedule them independently Each thread maintains
Stack Registers Scheduling properties (e.g. priority) Set of pending and blocked signals (to allow different
react differently to signals) Thread specific data
The Pthreads (POSIX threads) API
Three types of routines: Thread management: create, terminate, join, and detach Mutexes: mutual exclusion, creating, destroying,
locking, and unlocking mutexes Condition variables: event driven synchronizaiton.
Mutexes and condition variables are concerned about synchronization.
Why not anything related to inter-thread communication?
The concept of opaque objects pervades the design of the API.
The Pthreads API naming convention
Routine Prefix Function
Pthread_ General pthread
Pthread_attr_ Thread attributes
Pthread_mutex_ mutex
Pthread_mutexattr Mutex attributes
Pthread_cond_ Condition variables
Pthread_condaddr Conditional variable attributes
Pthread_key_ Thread specific data keys
Compiling pthread programs
Pthread header file <pthread.h> Compiling pthread programs: gcc –lpthread
aaa.c
Thread management routines
Creation: pthread_create Termination:
Return Pthread_exit Can we still use exit?
Wait (parent/child synchronization): pthread_join
Creation
Thread equivalent of fork()
int pthread_create(pthread_t * thread, pthread_attr_t * attr, void * (*start_routine)(void *), void * arg);
Returns 0 if OK, and non-zero (> 0) if error.
Parameters for the routines are passed through void * arg. What if we want to pass a structure?
Termination
Thread Termination Return from initial function. void pthread_exit(void * status)
Process Termination exit() called by any thread main() returns
Waiting for child thread
int pthread_join( pthread_t tid, void **status)
Equivalent of waitpid()for processes
Detaching a thread
The detached thread can act as daemon thread
The parent thread doesn’t need to wait
int pthread_detach(pthread_t tid)
Detaching self :
pthread_detach(pthread_self())
Some multi-thread program examples A multi-thread program example: example1.c Making multiple producers: example2.c
What is going on in this program? How to fix it?
Matrix multiply and threaded matrix multiply
Matrix multiply: C = A × B
N
k
jkBkiAjiC1
],[],[],[
1]-N 1,-B[N ......, 1], 1,-B[N 0], 1,-B[N
.................................................
1]-N B[1, ......, 1], B[1, 0], B[1,
1]-N B[0, ......, 1], B[0, 0], B[0,
1]-N 1,-A[N ......, 1], 1,-A[N 0], 1,-A[N
.................................................
1]-N A[1, ......, 1], A[1, 0], A[1,
1]-N A[0, ......, 1], A[0, 0], A[0,
1]-N 1,-C[N ......, 1], 1,-C[N 0], 1,-C[N
.................................................
1]-N C[1, ......, 1], C[1, 0], C[1,
1]-N C[0, ......, 1], C[0, 0], C[0,
Matrix multiply and threaded matrix multiply Sequential code:
For (i=0; i<N; i++)
for (j=0; j<N; j++)
for (k=0; k<N; k++) C[I, j] = C[I, j] + A[I, k] * A[k, j]
Threaded code program The calculation of c[I,j] does not depend on other C term.
Mm_pthread.c.
1]-N 1,-C[N ......, 1], 1,-C[N 0], 1,-C[N
.................................................
1]-N C[1, ......, 1], C[1, 0], C[1,
1]-N C[0, ......, 1], C[0, 0], C[0,
PI calculation
Sequential code: pi.c Threaded version: homework
)
0.1
0.41lim(
12
5.0
n
in
nin
PI
Type of Threads
Independent threads Cooperative threads
Independent Threads
No states shared with other threads Deterministic computation
Output depends on input Reproducible
Output does not depend on the order and timing of other threads
Scheduling order does not matter.
Cooperating Threads
Shared states Nondeterministic Nonreproducible
Example: 2 threads sharing the same displayThread A Thread B
cout << “ABC”; cout << “123”;
You may get “A12BC3”
So, Why Allow Cooperating Threads?
Shared resources e.g., a single processor
Speedup Occurs when threads use different resources
at different times mm_pthread.c
Modularity An application can be decomposed into
threads
Some Concurrent Programs
If threads work on separate data, scheduling does not matter
Thread A Thread B
x = 1; y = 2;
Some Concurrent Programs
If threads share data, the final values are not as obvious
Thread A Thread B
x = 1; y = 2;
x = y + 1; y = y * 2;
What are the indivisible operations?
Atomic Operations
An atomic operation always runs to completion; it’s all or nothing e.g., memory loads and stores on most
machines Many operations are not atomic
Double precision floating point store on 32-bit machines
Suppose…
Each C statement is atomic Let’s revisit the example…
All Possible Execution Orders
Thread A Thread B
x = 1; y = 2;
x = y + 1; y = y * 2;
x = 1 y = 2
x = y + 1 y = 2
y = 2
y = y * 2
x = y + 1 y = y * 2
y = y * 2 x = y + 1
x = 1 y = y * 2
x = 1
x = y + 1
A decision tree
All Possible Execution Orders
Thread A Thread B
x = 1; y = 2;
x = y + 1; y = y * 2;
x = 1 y = 2
x = y + 1 y = 2
x = y + 1 y = y * 2y = 2
y = y * 2 y = y * 2 x = y + 1
x = 1 y = y * 2
x = 1
x = y + 1
(x = ?, y = ?)
(x = 1, y = ?)
(x = ?, y = ?)
(x = ?, y = 2)
(x = ?, y = 4)
(x = 1, y = 2)
(x = ?, y = 2)
(x = ?, y = 4)
(x = 3, y = 2)
(x = 3, y = 4)
(x = 1, y = 4)
(x = 5, y = 4)
(x = 1, y = 4)
(x = 5, y = 4)
Another Example
Assume each C statement is atomic
Thread A Thread B
j = 0; j = 0;
while (j < 10) { while (j > -10) {
++j; --j;
} }
cout << “A wins”; cout << “B wins”;
So…
Who wins? Can the computation go on forever?
Race conditions occur when threads share data, and their results depend on the timing of their executions…
The take home point: sharing data in threads can cause problems, we need some mechanism to deal with such situation.