Threads of Thread

Multiprocessors and ThreadsLecture 3

Motivation for MultiprocessorsEnhanced Performance - Concurrent execution of tasks for increased throughput (between processes)Exploit Concurrency in Tasks (Parallelism within process)Fault Tolerance -graceful degradation in face of failures

Basic MP ArchitecturesSingle Instruction Single Data (SISD) - conventional uniprocessor designs.Single Instruction Multiple Data (SIMD) - Vector and Array ProcessorsMultiple Instruction Single Data (MISD) - Not Implemented.Multiple Instruction Multiple Data (MIMD) - conventional MP designs

MIMD ClassificationsTightly Coupled System - all processors share the same global memory and have the same address spaces (Typical SMP system).Main memory for IPC and Synchronization.Loosely Coupled System - memory is partitioned and attached to each processor. Hypercube, Clusters (Multi-Computer).Message passing for IPC and synchronization.

MP Block Diagram

Memory Access SchemesUniform Memory Access (UMA)Centrally located All processors are equidistant (access times)NonUniform Access (NUMA)physically partitioned but accessible by allprocessors have the same address spaceNO Remote Memory Access (NORMA)physically partitioned, not accessible by allprocessors have own address space

Other Details of MPInterconnection technologyBusCross-Bar switchMultistage Interconnect NetworkCaching - Cache Coherence Problem!Write-updateWrite-invalidatebus snooping

MP OS Structure - 1Separate Supervisor - all processors have their own copy of the kernel. Some share data for interaction dedicated I/O devices and file systemsgood fault tolerancebad for concurrency

MP OS Structure - 2Master/Slave Configurationmaster monitors the status and assigns work to other processors (slaves)Slaves are a schedulable pool of resources for the mastermaster can be bottleneckpoor fault tolerance

MP OS Structure - 3Symmetric Configuration - Most Flexible. all processors are autonomous, treated equalone copy of the kernel executed concurrently across all processorsSynchronize access to shared data structures:Lock entire OS - Floating MasterMitigated by dividing OS into segments that normally have little interactionmultithread kernel and control access to resources (continuum)

MP OverviewMultiProcessorSIMDMIMDShared Memory(tightly coupled)Distributed Memory(loosely coupled)Master/SlaveSymmetric(SMP)Clusters

SMP OS Design IssuesThreads - effectiveness of parallelism depends on performance of primitives used to express and control concurrency.Process Synchronization - disabling interrupts is not sufficient.Process Scheduling - efficient, policy controlled, task scheduling (process/threads)global versus per CPU schedulingTask affinity for a particular CPUresource accounting and intra-task thread dependencies

SMP OS design issues - 2Memory Management - complicated since main memory is shared by possibly many processors. Each processor must maintain its own map tables for each processcache coherencememory access synchronizationbalancing overhead with increased concurrencyReliability and fault Tolerance - degrade gracefully in the event of failures

Typical SMP SystemcacheMMUCPUcacheMMUCPUcacheMMUCPUcacheMMUCPUI/OsubsystemIssues:Memory contentionLimited bus BWI/O contentionCache coherenceTypical I/O Bus:33MHz/32bit (132MB/s)66MHz/64bit (528MB/s)500MHzSystem/Memory BusetherscsivideoBridgeSystem Functions(timer, BIOS, reset)INT

Some DefinitionsParallelism: degree to which a multiprocessor application achieves parallel executionConcurrency: Maximum parallelism an application can achieve with unlimited processorsSystem Concurrency: kernel recognizes multiple threads of control in a programUser Concurrency: User space threads (coroutines) provide a natural programming model for concurrent applications. Concurrency not supported by system.

Process and ThreadsProcess: encompasses set of threads (computational entities) collection of resourcesThread: Dynamic object representing an execution path and computational state.threads have their own computational state: PC, stack, user registers and private dataRemaining resources are shared amongst threads in a process

ThreadsEffectiveness of parallel computing depends on the performance of the primitives used to express and control parallelismThreads separate the notion of execution from the Process abstractionUseful for expressing the intrinsic concurrency of a program regardless of resulting performanceThree types: User threads, kernel threads and Light Weight Processes (LWP)

User Level ThreadsUser level threads - supported by user level (thread) libraryBenefits:no modifications required to kernelflexible and low costDrawbacks:can not block without blocking entire processno parallelism (not recognized by kernel)

Kernel Level ThreadsKernel level threads - kernel directly supports multiple threads of control in a process. Thread is the basic scheduling entityBenefits:coordination between scheduling and synchronizationless overhead than a processsuitable for parallel applicationDrawbacks:more expensive than user-level threadsgenerality leads to greater overhead

Light Weight Processes (LWP)Kernel supported user threadEach LWP is bound to one kernel thread.a kernel thread may not be bound to an LWPLWP is scheduled by kernelUser threads scheduled by library onto LWPsMultiple LWPs per process

First Class threads (Psyche OS)Thread operations in user space: create, destroy, synch, context switch kernel threads implement a virtual processorCourse grain in kernel - preemptive schedulingCommunication between kernel and threads libraryshared data structures.Software interrupts (user upcalls or signals). Example, for scheduling decisions and preemption warnings.Kernel scheduler interface - allows dissimilar thread packages to coordinate.

Scheduler ActivationsAn activation:serves as execution context for running threadnotifies thread of kernel events (upcall)space for kernel to save processor context of current user thread when stopped by kernelkernel is responsible for processor allocation => preemption by kernel.Thread package responsible for scheduling threads on available processors (activations)

Support for ThreadingBSD: process model only. 4.4 BSD enhancements.Solaris:provides user threads, kernel threads and LWPsMach: supports kernel threads and tasks. Thread libraries provide semantics of user threads, LWPs and kernel threads. Digital UNIX: extends MACH to provide usual UNIX semantics. Pthreads library.

Solaris ThreadsSupports:user threads (uthreads) via libthread and libpthreadLWPs, acts as a virtual CPU for user threadskernel threads (kthread), every LWP is associated with one kthread, however a kthread may not have an LWPinterrupts as threads

Solaris kthreadsFundamental scheduling/dispatching objectall kthreads share same virtual address space (the kernels) - cheap context switchSystem threads - example STREAMS, calloutkthread_t, /usr/include/sys/thread.hscheduling info, pointers for scheduler or sleep queues, pointer to klwp_t and proc_t

Solaris LWPBound to a kthreadLWP specific fields from proc are kept in klwp_t (/usr/include/sys/klwp.h)user-level registers, system call params, resource usage, pointer to kthread_t and proc_tklwp_t can be swapped with LWPLWP non-swappable info kept in kthread_t

Solaris LWP (cont)All LWPs in a process share:signal handlersEach may have its own signal maskalternate stack for signal handlingNo global name space for LWPs

Solaris User ThreadsImplemented in user librarieslibrary provides synchronization and scheduling facilitiesthreads may be bound to LWPsunbound threads compete for available LWPsManage thread specific infothread id, saved register state, user stack, signal mask, priority*, thread local storageSolaris provides two libraries: libthread and libpthread.Try man thread or man pthreads

Solaris Thread Data Structuresproc_tkthread_tklwp_tp_tlistt_procpt_forwlwp_threadt_lwplwp_procp

Solaris: Processes, Threads and LWPsProcess 1userkernelhardwareProcess 2Int kthr

Solaris InterruptsOne system wide clock kthreadpool of 9 partially initialized kthreads per CPU for interruptsinterrupt thread can blockinterrupted thread is pinned to the CPU

Solaris Signals and ForkDivided into Traps (synchronous) and interrupts (asynchronous)each thread has its own signal mask, global set of signal handlersEach LWP can specify alternate stackfork replicates all LWPsfork1 only the invoking LWP/thread

MachTwo abstractions:Task - static object, address space and system resources called port rights.Thread - fundamental execution unit and runs in context of a task. Zero or more threads per task, kernel schedulablekernel stackcomputational stateProcessor sets - available processors divided into non-intersecting sets.permits dedicating processor sets to one or more tasks

Mach c-thread ImplementationsCoroutine-based - multiples user threads onto a single-threaded taskThread-based - one-to-one mapping from c-threads to Mach threads. Default.Task-based - One Mach Task per c-thread.

Digital UNIXBased on Mach 2.5 kernelProvides complete UNIX programmers interface4.3BSD code and ULTRIX code ported to Machu-area replaced by utask and uthreadproc structure retained

Digital UNIX threadsSignals divided into synchronous and asynchronousglobal signal maskeach thread can define its own handlers for synchronous signalsglobal handlers for asynchronous signals

Pthreads libraryOne Mach thread per pthreadimplements asynchronous I/Oseparate thread created for synchronous I/O which in turn signals original threadlibrary includes signal handling, scheduling functions, and synchronization primitives.

Mach ContinuationsAddress problem of excessive kernel stack memory requirementsprocess model versus interrupt modelone per process kernel stack versus a per thread kernel stackThread is first responsible for saving any required state (the thread structure allows up to 28 bytes)indicate a function to be invoked when unblocked (the continuation function)Advantage: stack can be transferred between threads eliminating copy overhead.

Threads in Windows NTDesign driven by need to support a variety of OS environmentsNT process implemented as an objectexecutable process contains >= 1 threadprocess and thread objects have built in synchronization capabilitiesS

NT ThreadsSupport for kernel (system) threadsThreads are scheduled by the kernel and thus are similar to UNIX threads bound to an LWP (kernel thread)fibers are threads which are not scheduled by the kernel and thus are similar to unbound user threads.

4.4 BSD UNIXInitial support for threads implemented but not enabled in distributionProc structure and u-area reorganizedAll threads have a unique IDHow are the proc and u areas reorganized to support threads?

Threads of Thread

Documents

Transcript of Threads of Thread