Multi-processor Scheduling
description
Transcript of Multi-processor Scheduling
Multi-processor Scheduling
Two implementation choices Single, global ready queue Per-processor run queue
Which is better?
Queue-per-processor
Advantages of queue per processor Promotes processor affinity (better cache locality) Removes a centralized bottleneck
Which runs in global memory Supported by default in Linux 2.6 Java 1.6 support: a double-ended queue
(java.util.Deque) Use a bounded buffer per consumer If nothing in a consumer’s queue, steal work from
somebody else If too much in the queue, push work somewhere else
Thread Implementation Issues
Andrew Whitaker
Where do Threads Come From?
A few choices: The operating system A user-mode library Some combination of the two…
Option #1: Kernel Threads
Threads implemented inside the OS Thread operations (creation, deletion,
yield) are system calls Scheduling handled by the OS scheduler
Described as “one-to-one” One user thread mapped to one
kernel thread Every invocation of Thread.start()
creates a kernel thread
process
OS threads
Option #2: User threads
Implemented as a library inside a process
All operations (creation, destruction, yield) are normal procedure calls
Described as “many-to-one” Many user-perceived threads map
to a single OS process/thread
process
OS thread
Process Address Space Review
Every process has a user stack and a program counter
In addition, each process has a kernel stack and program counter (not shown here)
code(text segment)
static data(data segment)
heap(dynamic allocated mem)
stack
SP
PC
code
(text segment)
static data(data segment)
heap(dynamic allocated mem)
thread 1 stack
PC (T2)
SP (T2)thread 2 stack
thread 3 stack
SP (T1)
SP (T3)
PC (T1)PC (T3)
Threaded Address Space
Every thread always has its own user stack and program counter For both user, kernel
threadsFor user threads, there
is only a single kernel stack, program counter, PCB, etc.
User address space (for both user and kernel threads)
User Threads vs. Kernel Threads
User threads are faster Operations do not pass through the OS
But, user threads suffer from: Lack of physical parallelism
Only run on a single processor! Poor performance with I/O
A single blocking operation stalls the entire application
For these reasons, most (all?) major OS’s provide some form of kernel threads
When Would User Threads Be Useful?The calculator?The web server?The Fibonacci GUI?
Option #3: Two-level Model
OS supports native multi-threading
And, a user library maps multiple user threads to a single kernel thread
“Many-to-many” Potentially captures the best of
both worlds Cheap thread operations Parallelism
process
OS threads
Problems with Many-to-Many Threads
Lack of coordination between user and kernel schedulers “Left hand not talking to the right”
Specific problems Poor performance
e.g., the OS preempts a thread holding a crucial lock Deadlock
Given K kernel threads, at most K user threads can block
• Other runnable threads are starved out!
Scheduler Activations, UW 1991
Add a layer of communication between kernel and user schedulers
Examples: Kernel tells user-mode that a task has blocked
User scheduler can re-use this execution context Kernel tells user-mode that a task is ready to resume
Allows the user scheduler to alter the user-thread/kernel-thread mapping
Supported by newest release of NetBSD
Implementation Spilling Over into the InterfaceIn practice, programmers have
learned to live with expensive kernel threads
For example, thread pools Re-use a static set of threads
throughout the lifetime of the program
Locks
Used for implementing critical sectionsModern languages (Java, C#) implicitly
acquire and release locks
interface Lock { public void acquire(); // only one thread allowed between an // acquire and a release public void release();}
Two Varieties of Locks
Spin locks Threads busy wait until the lock is freed
Thread stays in the ready/running state
Blocking locks Threads yield the processor until the lock is
freed Thread transitions to the blocked state
Why Use Spin Locks?
Spin Locks can be faster No context switching required
Sometimes, blocking is not an option For example, in the kernel scheduler
implementation
Spin locks are never used on a uniprocessor
Bogus Spin Lock Implementation
1. class SpinLock implements Lock {2. private volatile boolean isLocked = false;
3. public void acquire() {4. while (isLocked) { ; } // busy wait5. isLocked = true;6. }
7. public void release() {8. isLocked = false;9. }10. }
Multiple threads can acquire this lock!
Hardware Support for Locking
Problem: Lack of atomicity in testing and setting the isLocked flag
Solution: Hardware-supported atomic instructions e.g., atomic test-and-set
Java conveniently abstracts these primitives (AtomicInteger, and friends)
Corrected Spin Lock
1. class SpinLock implements Lock {2. private final AtomicBoolean isLocked = 3. new AtomicBoolean (false);
4. public void acquire() {5. // get the old value, set a new value6. while (isLocked.getAndSet(true)) { ; }7. }
8. public void release() {9. assert (isLocked.get() == true);10. isLocked.set(false);11. }12. }
Blocking Locks: Acquire ImplementationAtomically test-and-set locked statusIf lock is already held:
Set thread state to blocked Add PCB (task_struct) to a wait queue Invoke the scheduler
Problem: must ensurethread-safe access to the wait queue!
Disabling Interrupts
Prevents the processor from being interrupted Serves as a coarse-grained lock
Must be used with extreme care No I/O or timers can be processed
Thread-safe Blocking Locks
Atomically test-and-set locked statusIf lock is already held:
Set thread state to blocked Disable interrupts Add PCB (task_struct) to a wait queue Invoke the scheduler Next task re-enables interrupts
Disabling Interrupts on a MultiprocessorDisabling interrupts can be done locally or
globally (for all processors) Global disabling is extremely heavyweight
Linux: spin_lock_irq Disable interrupts on the local processor Grab a spin lock to lock out other processors
Preview For Next Week1. public class Example extends Thread {
2. private static int x = 1;3. private static int y = 1; 4. private static boolean ready = false;
5. public static void main(String[] args) {6. Thread t = new new Example(); 7. t.start();8. 9. x = 2;10. y = 2;11. ready = true;12. }
13. public void run() {14. while (! ready)15. Thread.yield(); // give up the processor16. System.out.println(“x= “ + x + “y= “ + y);17. }18. }
What Does This Program Print?
Answer: it’s a race condition. Many different outputs are possible x=2, y=2 x=1,y=2 x=2,y=1 x=1,y=1 Or, the program may print nothing!
The ready loop runs forever