CS519: Lecture 7 zUniprocessor and Multiprocessor Scheduling.

CS519: Lecture 7

Uniprocessor and Multiprocessor Scheduling

CS 519 – Operating Systems Theory DCS, Rutgers University2

What and Why?

What is processor scheduling? Why?

At first to share an expensive resource – multiprogramming

Now to perform concurrent tasks because processor is so powerful

Future looks like past + now Rent-a-computer approach – large data/processing centers

use multiprogramming to maximize resource utilization Systems still powerful enough for each user to run multiple

concurrent tasks


Assumptions

Pool of jobs contending for the CPU CPU is a scarce resource

Jobs are independent and compete for resources (this assumption is not always used)

Scheduler mediates between jobs to optimize some performance criteria


Types of Scheduling

We’re mostlyconcerned withshort-termscheduling


What Do We Optimize?

System-oriented metrics: Processor utilization: percentage of time the processor is busy Throughput: number of processes completed per unit of time

User-oriented metrics: Turnaround time: interval of time between submission and

termination (including any waiting time). Appropriate for batch jobs

Response time: for interactive jobs, time from the submission of a request until the response begins to be received

Deadlines: when process completion deadlines are specified, the percentage of deadlines met must be promoted


Design Space

Two dimensions Selection function

Which of the ready jobs should be run next? Preemption

Preemptive: currently running job may be interrupted and moved to Ready state

Non-preemptive: once a process is in Running state, it continues to execute until it terminates or it blocks for I/O or system service


Job Behavior

I/O-bound jobs Jobs that perform lots of

I/O Tend to have short CPU

bursts CPU-bound jobs

Jobs that perform very little I/O

Tend to have very long CPU bursts

Distribution tends to be hyper-exponential Very large number of very

short CPU bursts A small number of very

long CPU bursts

CPU

Disk


Scheduling Algorithms

FIFO: non-preemptive Round-Robin: preemptive Shortest Job Next: non-preemptive Shortest Remaining Time: preemptive at arrival Highest Response Ratio (turnaround/service time)

Next: non-preemptive Priority with feedback: preemptive at time

quantum


Example Job Set

Process

ArrivalTime

ServiceTime

1

2

3

4

5

0

2

4

6

8

3

6

4

5

2


Behavior of Scheduling Policies


Priority with Feedback Scheduling

After each preemption,process moves to

lower-priority queue


Scheduling Algorithms

FIFO is simple but leads to poor average response times. Short processes are delayed by long processes that arrive before them

RR eliminate this problem, but favors CPU-bound jobs, which have longer CPU bursts than I/O-bound jobs

SJN, SRT, and HRRN alleviate the problem with FIFO, but require information on the length of each process. This information is not always available (although it can sometimes be approximated based on past history or user input)

Feedback is a way of alleviating the problem with FIFO without information on process length


It’s a Changing World

Assumption about bi-modal workload no longer holds Interactive continuous media applications are

sometimes processor-bound but require good response times

New computing model requires more flexibility How to match priorities of cooperative jobs, such as

client/server jobs? How to balance execution between multiple threads of a

single process?


Lottery Scheduling

Randomized resource allocation mechanism Resource rights are represented by lottery tickets Have rounds of lottery In each round, the winning ticket (and therefore

the winner) is chosen at random The chances of you winning directly depends on

the number of tickets that you have P[wining] = t/T, t = your number of tickets, T = total

number of tickets


Lottery Scheduling

After n rounds, your expected number of wins is E[win] = nP[wining]

The expected number of lotteries that a client must wait before its first win E[wait] = 1/P[wining]

Lottery scheduling implements proportional-share resource management

Ticket currencies allow isolation between users, processes, and threads

OK, so how do we actually schedule the processor using lottery scheduling?


Implementation


Performance

Allocated and observed execution ratios between

two tasks running the Dhrystone benchmark.With exception of 10:1

allocation ratio, all observedratios are close to allocations


Short-term Allocation Ratio


Isolation

Five tasks running the Dhrystonebenchmark. Let amount.currency

denote a ticket allocation of amountdenominated in currency. Tasks

A1 and A2 have allocations 100.A and200.A, respectively. Tasks B1 and B2have allocations 100.B and 200.B,

respectively. Halfway thru experimentB3 is started with allocation 300.B.

This inflates the number of tickets in Bfrom 300 to 600. There’s no effect on

tasks in currency A or on the aggregate iteration ratio of A tasks to B tasks. Tasks B1 and B2 slow to half their

original rates, corresponding to the factor of 2 inflation caused by B3.


Thread Scheduling for Cache Locality

Traditionally, each resource (CPU, memory, I/O) has been managed separately

Resources are not independent, however Policy for one resource can affect how another

resource is used. For instance, the order in which threads are scheduled can affect performance of memory subsystem

Neat paper that uses a very simple scheduling idea to enhance memory performance


Main Idea

When working with a large array, want to tile (block) for efficient use of the cache What is tiling? Restructuring loops for data re-use.

Tiling by hand is a pain and is error-prone Compilers can automatically tile but not always.

For instance, when program contains dynamically allocated or indirectly accessed data

So, use threads and hints to improve cache utilization


Example

Thread ti is denoted by ti(ai1,…,aik),

where aij is the address of the jth

piece of data reference by thread ti.Simplify by using just 2 or 3 addressesor hints. Hints might be elements ofrows or columns of a matrix, for ex.


Algorithm

Hash algorithm should assign threads to bins so that threads with similar hints fall in the same bin

Threads in each bin are scheduled together

2-D plane of bins with 2 hints

Key insight: the sum of the two dimensions of a bin is less than the cache size C

Easy to extend to k hints


Performance

Fork = create and schedule null threadRun = execute and terminate null thread


More Complex Examples

Partial differential equation solver


Multiprocessor Scheduling

Load sharing: single ready queue; processor dequeues thread at the front of the queue when idle; preempted threads are placed at the end of queue

Gang scheduling: all threads belonging to an application run at the same time

Hardware partitions: chunk of the machine is dedicated to each application

Advantages and disadvantages?


Multiprocessor Scheduling

Load sharing: poor locality; poor synchronization behavior; simple; good processor utilization. Affinity or per processor queues can improve locality.

Gang scheduling: central control; fragmentation --unnecessary processor idle times (e.g., two applications with P/2+1 threads); good synchronization behavior; if careful, good locality

Hardware partitions: poor utilization for I/O-intensive applications; fragmentation – unnecessary processor idle times when partitions left are small; good locality and synchronization behavior

CS519: Lecture 7 zUniprocessor and Multiprocessor Scheduling.

Documents

Transcript of CS519: Lecture 7 zUniprocessor and Multiprocessor Scheduling.