Introduction to Threads

4.1

Introduction to Threads

Overview

Multithreading Models

Thread Libraries

Threading Issues

Operating System Examples

Windows XP Threads

Linux Threads

4.2

Threads

A Thread is just a sequence of instructions to execute

Threads share the same memory space as other threads in the same application – so they automatically share data and variables.

Threads can run on different processor cores on a multicore processor – this makes applications faster and more responsive

Even on a single core processor threads make an application more responsive – if one thread stops waiting for I/O, other threads can still run

Processes have a unique virtual memory address space and they take a lot longer for the OS to switch between than threads. Sharing data requires additional overhead and steps – so they have a lot more overhead than threads in many applications. Most applications have one process with several threads.

In C/C++, a thread typically runs the code in a C/C++ function and a special API call starts up a new thread running that function.

4.3

Single and Multithreaded Processes

4.4

Benefits of Threads

Responsiveness

Applications can run up to N times faster on an N core processor

Resource Sharing

Economy

Scalability

4.5

Multicore Programming

Applications only run on one processor core - unless they use multiple threads

Multicore systems are putting more pressure on programmers to use threads, multithreaded application challenges include:

Dividing activities

Balancing the Computational Load

Data splitting

Data dependency

Testing and debugging

4.6

Concurrent Execution on a Single-core System

OS can time slice between the four Threads T1…T4

4.7

Parallel Execution on a Multicore System

OS can time slice the four Threads T1…T4 on two processor cores. Two threads can run in parallel on different cores. Application could run up to twice as fast. Without threads, an application can run on only one core!

4.8

User Threads

Thread management done by a user-level threads library

Three primary thread libraries:

POSIX Pthreads

Win32 threads

Java and C# threads

A simplified thread library wrapper called GThreads will be used in the last lab on Jinx

4.9

Thread Libraries

Thread library provides programmer with API for creating and managing threads

Two primary ways of implementing

Library entirely in user space

Kernel-level library supported by the OS

4.10

Pthreads

A POSIX standard (IEEE 1003.1c) API for thread creation and synchronization

API specifies behavior of the thread library, implementation is up to development of the library

Common in UNIX operating systems (Solaris, Linux, Mac OS X)

Can also be added to Windows by installing the optional Pthreads library

4.11

Java and C# Threads

Thread support is built into these newer languages with keywords

Java threads are managed by the JVM

C# thread support is in .Net Framework (the C# JVM)

Typically implemented using the threads model provided by underlying OS

Java and C# threads may be created by:

Extending Thread class

Implementing the Runnable interface

4.12

Threading Issues

Semantics of fork() and exec() system calls

Thread cancellation of target thread

Asynchronous or deferred

Signal handling

Thread pools

Thread-specific data

Scheduler activations

4.13

Thread Cancellation

Terminating a thread before it has finished

Two general approaches:

Asynchronous cancellation terminates the target thread immediately

Deferred cancellation allows the target thread to periodically check if it should be cancelled

4.14

Signal Handling

Signals are used in UNIX systems to notify a process that a particular event has occurred

A signal handler is used to process signals

1. Signal is generated by particular event

2. Signal is delivered to a process

3. Signal is handled

Options:

Deliver the signal to the thread to which the signal applies

Deliver the signal to every thread in the process

Deliver the signal to certain threads in the process

Assign a specific thread to receive all signals for the process

4.15

Thread Pools

Create a number of threads in a pool where they await work

Advantages:

Usually slightly faster to service a request with an existing thread than create a new thread

Allows the number of threads in the application(s) to be bound to the size of the pool

4.16

Windows Threads

Implements the one-to-one mapping, kernel-level

Each thread contains

A thread id

Register set

Separate user and kernel stacks

Private data storage area

The register set, stacks, and private storage area are known as the context of the threads

4.17

Linux Threads

Linux refers to them as tasks rather than threads

Thread creation is done through clone() system call

clone() allows a child task to share the address space of the parent task (process)

OS can time slice between the four Threads T1…T4

Background on the need for Synchronization

• Threads may need to wait for other threads to finish an operation

• Additionally concurrent access to shared data with threads may result in data inconsistency (i.e., incorrect values)

• Maintaining data consistency requires mechanisms to ensure the orderly execution of cooperating processes (or threads)

Example Problem

• Suppose two threads share a common buffer array. The producer put items in the buffer and the consumer removes them.

• A solution to a two thread consumer-producer problem that fills all the buffer space has an integer count that keeps track of the number of full buffers. Initially, count is set to 0. It is incremented by the producer after it produces a new buffer and is decremented by the consumer after it consumes a buffer.

Producer while (true) { /* produce an item and put in

nextProduced */ while (count == BUFFER_SIZE)

; // do nothing buffer [in] = nextProduced; in = (in + 1) % BUFFER_SIZE; count++;

}

Consumer while (true) {

while (count == 0) ; // do nothing nextConsumed = buffer[out]; out = (out + 1) % BUFFER_SIZE;

count--;

// consume the item in nextConsumed}

Critical Section

• The code segments that read and write global shared data between threads or processes is called a “critical section”

• Possible race condition bugs on global variable values – example will follow

• OS Synchronization API used to solve this• Must be careful and use OS synchronization

primitives to control access to a critical section or hidden bugs will appear in code

Race Condition on Count• count++ could be implemented as

register1 = count register1 = register1 + 1 count = register1

• count-- could be implemented as

register2 = count register2 = register2 - 1 count = register2

• Consider this execution interleaving with “count = 5” initially:

S0: producer executes register1 = count {register1 = 5}S1: producer executes register1 = register1 + 1 {register1 = 6} S2: consumer executes register2 = count {register2 = 5} S3: consumer executes register2 = register2 - 1 {register2 = 4} S4: producer executes count = register1 {count = 6 } S5: consumer executes count = register2 {count = 4}

Need an Atomic Operation

• Count++ and Count-- code must run to end before switching to other thread to avoid bugs

• Atomic operation here means a basic operation which cannot be stopped or interrupted in the middle to switch to another thread

• Race conditions will occur faster on systems with multiple processors since threads are running in parallel

Solution to Critical-Section Problem1. Mutual Exclusion (Mutex) - If process Pi is executing in its critical

section, then no other processes can be executing in their critical sections

2. Progress - If no process is executing in its critical section and there exist some processes that wish to enter their critical section, then the selection of the processes that will enter the critical section next cannot be postponed indefinitely

3. Bounded Waiting - A bound must exist on the number of times that other processes are allowed to enter their critical sections after a process has made a request to enter its critical section and before that request is grantedAssume that each process executes at a nonzero speed No assumption concerning relative speed of the N processes

Solution to Critical-section Problem Using Mutex Locks

do { acquire lock

critical section release lock

remainder section } while (TRUE);

Deadlock and Starvation• Deadlock – two or more processes or threads are waiting indefinitely for

an event that can be caused by only one of the waiting processes• Let S and Q be two semaphores initialized to 1 (i.e. a mutual exclusion

lock) P0 P1

wait (S); wait (Q); wait (Q); wait (S);

. .

. .

. . signal (S); signal (Q); signal (Q); signal (S);

• Starvation – indefinite blocking. A process may never be removed from the semaphore queue in which it is suspended

• Priority Inversion - Scheduling problem when lower-priority process holds a lock needed by higher-priority process. Might need to run lower –priority process first to continue. – messes up priority on processes

Barriers for Thread SynchronizationBarriers allow defining synchronization points used to coordinate the execution of a team of threads. When a thread reaches a synchronization point, its execution is stopped until all other threads in the team reach the synchronization point.

Basic BarrierA simple barrier is implemented using an atomic shared counter. The counter is incremented by each thread after entering the barrier. Threads wait at the barrier until the counter becomes equal to the number of threads.This kind of barrier cannot be reused, because the counter is never reset safely.Reusing the barrier, through resetting the counter, results in possible starvation, because storing 0 into the counter will mask the old value. If a thread is suspended during the resetting phase, it will never leave the barrier.

Sense Reversing BarrierAdding a sense flag allows reuse of a barrier many times. The barrier counter is used to keep track of how many threads have reached the barrier, but the waiting phase is performed by spinning on a sense flag. Threads wait until the barrier sense flag matches the thread-private sense flag. The last thread reaching the barrier resets both the counter and the barrier sense flag, while each thread must reset its local sense flag before exiting the barrier.The sense flag allows the discrimination between odd and even barrier phases. Resetting the counter is not an unsafe operation because it does not interfere with the barrier waiting variable, represented by the sense flag.

Introduction to Threads

Documents

Transcript of Introduction to Threads