Parallel Job Schedulings - University of Technologyhungnq/courses/pp/ParallelProcessing_9... ·...

27
Parallel Job Schedulings Thoai Nam

Transcript of Parallel Job Schedulings - University of Technologyhungnq/courses/pp/ParallelProcessing_9... ·...

Page 1: Parallel Job Schedulings - University of Technologyhungnq/courses/pp/ParallelProcessing_9... · Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Classes of scheduling

Parallel Job Schedulings

Thoai Nam

Page 2: Parallel Job Schedulings - University of Technologyhungnq/courses/pp/ParallelProcessing_9... · Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Classes of scheduling

Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

Scheduling on UMA

Multiprocessors

Schedule:

allocation of tasks to processors

Dynamic scheduling

– A single queue of ready processes

– A physical processor accesses the queue to run the next

process

– The binding of processes to processors is not tight

Static scheduling

– Only one process per processor

– Speedup can be predicted

Page 3: Parallel Job Schedulings - University of Technologyhungnq/courses/pp/ParallelProcessing_9... · Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Classes of scheduling

Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

Classes of scheduling

Static scheduling

– An application is modeled as an directed acyclic graph (DAG)

– The system is modeled as a set of homogeneous processors

– An optimal schedule: NP-complete

Scheduling in the runtime system

– Multithreads: functions for thread creation, synchronization, and

termination

– Parallelizing compilers: parallelism from the loops of the sequential

programs

Scheduling in the OS

– Multiple programs must co-exist in the same system

Administrative scheduling

Page 4: Parallel Job Schedulings - University of Technologyhungnq/courses/pp/ParallelProcessing_9... · Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Classes of scheduling

Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

Deterministic model

A parallel program is a collection of tasks, some of which must be completed before others begin

Deterministic model: The execution time needed

by each task and the precedence relations between tasks are fixed and known before run time

Task graph

T1 --------

2

T2 -------

3

T3 --------

1

T4 --------

2

T5 --------

3

T6 --------

3

T7 --------

1

Page 5: Parallel Job Schedulings - University of Technologyhungnq/courses/pp/ParallelProcessing_9... · Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Classes of scheduling

Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

Gantt chart

Gantt chart indicates the time each task

spends in execution, as well as the

processor on which it executes

T4

T3 T6

T1 T2 T5 T7

1 2 3 4 5 6 7 8 9

T1 --------

2

T2 -------

3

T3 --------

1

T4 --------

2

T5 --------

3

T6 --------

3

T7 --------

1

Time

Processors

Page 6: Parallel Job Schedulings - University of Technologyhungnq/courses/pp/ParallelProcessing_9... · Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Classes of scheduling

Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

Optimal schedule

If all of the tasks take unit time, and the task graph is a

forest (i.e., no task has more than one predecessor), then a

polynomial time algorithm exists to find an optimal schedule

If all of the tasks take unit time, and the number of

processors is two, then a polynomial time algorithm exists to

find an optimal schedule

If the task lengths vary at all, or if there are more than two

processors, then the problem of finding an optimal schedule is NP-hard.

Page 7: Parallel Job Schedulings - University of Technologyhungnq/courses/pp/ParallelProcessing_9... · Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Classes of scheduling

Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

Graham’s list scheduling algorithm

T = {T1, T2,…, Tn}

a set of tasks

: T (0,)

a function associates an execution time with each task

A partial order < on T

L is a list of task on T

Whenever a processor has no work to do, it instantaneously

removes from L the first ready task; that is, an unscheduled task whose predecessors under < have all completed

execution. (The processor with the lower index is prior)

Page 8: Parallel Job Schedulings - University of Technologyhungnq/courses/pp/ParallelProcessing_9... · Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Classes of scheduling

Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

Graham’s list scheduling algorithm

- Example

T4

T3 T6

T1 T2 T5 T7

T1 --------

2

T2 -------

3

T3 --------

1

T4 --------

2

T5 --------

3

T6 --------

3

T7 --------

1

Time

Processors

L = {T1, T2, T3, T4, T5, T6, T7}

Page 9: Parallel Job Schedulings - University of Technologyhungnq/courses/pp/ParallelProcessing_9... · Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Classes of scheduling

Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

Graham’s list scheduling algorithm

- Problem

T1 T9

T2 T4 T5 T7

T3 T6 T8

T1 T8

T2 T5 T9

T3 T6

T4 T7

T1 --------

3

T9 --------

9

T2 --------

2

T3 --------

2

T4 --------

2

T5 --------

4

T6 --------

4

T7 --------

4

T8 --------

4 L = {T1, T2, T3, T4, T5, T6, T7, T8, T9}

Page 10: Parallel Job Schedulings - University of Technologyhungnq/courses/pp/ParallelProcessing_9... · Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Classes of scheduling

Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

Coffman-Graham’s scheduling

algorithm (1)

Graham’s list scheduling algorithm depends upon a

prioritized list of tasks to execute

Coffman and Graham (1972) construct a list of tasks for the

simple case when all tasks take the same amount of time.

Page 11: Parallel Job Schedulings - University of Technologyhungnq/courses/pp/ParallelProcessing_9... · Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Classes of scheduling

Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

Coffman-Graham’s scheduling

algorithm (2)

Let T = T1, T2,…, Tn be a set of n unit-time tasks to be

executed on p processors

If Ti < Tj, then task is Ti an immediate predecessor of task Tj,

and Tj is an immediate successor of task Ti

Let S(Ti) denote the set of immediate successor of task Ti

Let (Ti) be an integer label assigned to Ti.

N(T) denotes the decreasing sequence of integers formed

by ordering of the set {(T’)| T’ S(T)}

Page 12: Parallel Job Schedulings - University of Technologyhungnq/courses/pp/ParallelProcessing_9... · Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Classes of scheduling

Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

Coffman-Graham’s scheduling

algorithm (3)

1. Choose an arbitrary task Tk from T such that S(Tk) = 0, and define (Tk)

to be 1

2. for i 2 to n do

a. R be the set of unlabeled tasks with no unlabeled successors

b. Let T* be the task in R such that N(T*) is lexicographically smaller

than N(T) for all T in R

c. Let (T*) i

endfor

3. Construct a list of tasks L = {Un, Un-1,…, U2, U1} such that (Ui) = i for all i

where 1 i n

4. Given (T, <, L), use Graham’s list scheduling algorithm to schedule the

tasks in T

Page 13: Parallel Job Schedulings - University of Technologyhungnq/courses/pp/ParallelProcessing_9... · Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Classes of scheduling

Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

Coffman-Graham’s scheduling

algorithm – Example (1)

T1

T3

T4

T5

T7

T8

T9

T2

T6

T2 T6 T4 T7 T9

T1 T3 T8

T5

Page 14: Parallel Job Schedulings - University of Technologyhungnq/courses/pp/ParallelProcessing_9... · Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Classes of scheduling

Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

Coffman-Graham’s scheduling

algorithm – Example (2)

Step1 of algorithm

task T9 is the only task with no immediate successor. Assign 1 to (T9)

Step2 of algorithm

i=2: R = {T7, T8}, N(T7)= {1} and N(T8)= {1} Arbitrarily choose task T7

and assign 2 to (T7)

i=3: R = {T3, T4, T5, T8}, N(T3)= {2}, N(T4)= {2}, N(T5)= {2} and N(T8)= {1}

Choose task T8 and assign 3 to (T8)

i=4: R = {T3, T4, T5, T6}, N(T3)= {2}, N(T4)= {2}, N(T5)= {2} and N(T6)= {3}

Arbitrarily choose task T4 and assign 4 to (T4)

i=5: R = {T3, T5, T6}, N(T3)= {2}, N(T5)= {2} and N(T6)= {3} Arbitrarily

choose task T5 and assign 5 to (T5)

i=6: R = {T3, T6}, N(T3)= {2} and N(T6)= {3} Choose task T3 and assign 6

to (T3)

Page 15: Parallel Job Schedulings - University of Technologyhungnq/courses/pp/ParallelProcessing_9... · Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Classes of scheduling

Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

Coffman-Graham’s scheduling

algorithm – Example (3)

i=7: R = {T1, T6}, N(T1)= {6, 5, 4} and N(T6)= {3} Choose task T6 and

assign 7 to (T6)

i=8: R = {T1, T2}, N(T1)= {6, 5, 4} and N(T2)= {7} Choose task T1 and

assign 8 to (T1)

i=9: R = {T2}, N(T2)= {7} Choose task T2 and assign 9 to (T2)

Step 3 of algorithm

L = {T2, T1, T6, T3, T5, T4, T8, T7, T9}

Step 4 of algorithm

Schedule is the result of applying Graham’s list-scheduling algorithm to

task graph T and list L

Page 16: Parallel Job Schedulings - University of Technologyhungnq/courses/pp/ParallelProcessing_9... · Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Classes of scheduling

Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

Issues in processor scheduling

Preemption inside spinlock-controlled critical sections

Cache corruption

Context switching overhead

Enter

Critical Section

Exit

P0

Enter

Critical Section

Exit

P1

Enter

Critical Section

Exit

P2

Page 17: Parallel Job Schedulings - University of Technologyhungnq/courses/pp/ParallelProcessing_9... · Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Classes of scheduling

Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

Current approaches

Global queue

Variable partitioning

Dynamic partitioning with two-level scheduling

Gang scheduling

Page 18: Parallel Job Schedulings - University of Technologyhungnq/courses/pp/ParallelProcessing_9... · Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Classes of scheduling

Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

Global queue

A copy of uni-processor system on each node, while sharing

the main data structures, specifically the run queue

Used in small-scale bus-based UMA shared memory

machines such as Sequent multiprocessors, SGI

multiprocessor workstations and Mach OS

Autonamic load sharing

Cache corruption

Preemption inside spinlock-controlled critical sections

Page 19: Parallel Job Schedulings - University of Technologyhungnq/courses/pp/ParallelProcessing_9... · Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Classes of scheduling

Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

Variable partitioning

Processors are partitioned into disjoined sets and each job is

run only in a distinct partition

Distributed memory machines: Intel and nCube hypercudes,

IBM PS2, Intel Paragon, Cray T3D

Problem: fragmentation, big jobs

Scheme

Parameters taken into account

User request System load Changes

Fixed no no no

Variable yes no no

Adaptive yes yes no

Dynamic yes yes yes

Page 20: Parallel Job Schedulings - University of Technologyhungnq/courses/pp/ParallelProcessing_9... · Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Classes of scheduling

Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

Dynamic partitioning with

two-level scheduling

Changes in allocation during execution

Workpile model:

– The work = an unordered pile of tasks or chores

– The computation = a set of worker threads, one per processor, that

take one chore at time from the work pile

– Allowing for the adjustment to different numbers of processors by

changing the number of the wokers

– Two-level scheduling scheme: the OS deals with the allocation of

processors to jobs, while applications handle the scheduling of chores

on those processors

Page 21: Parallel Job Schedulings - University of Technologyhungnq/courses/pp/ParallelProcessing_9... · Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Classes of scheduling

Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

Gang scheduling

Problem: Interactive response times time slicing

– Global queue: uncoordinated manner

Observartion:

– Coordinated scheduling in only needed if the job’s threads interact

frequently

– The rare of interaction can be used to drive the grouping of threads

into gangs

Samples:

– Co-scheduling

– Family scheduling: which allows more threads than processors and

uses a second level of internal time slicing

Page 22: Parallel Job Schedulings - University of Technologyhungnq/courses/pp/ParallelProcessing_9... · Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Classes of scheduling

Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

Several specific

scheduling methods

Co-scheduling

Smart scheduling [Zahorijan et al.]

Scheduling in the NYU Ultracomputer [Elter et al.]

Affinity based scheduling

Scheduling in the Mach OS

Page 23: Parallel Job Schedulings - University of Technologyhungnq/courses/pp/ParallelProcessing_9... · Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Classes of scheduling

Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

Co-Scheduling

Context switching between applications rather then between

tasks of several applications.

Solving the problem of “preemption inside spinlock-controlled

critical sections”.

Cache corruption???

Page 24: Parallel Job Schedulings - University of Technologyhungnq/courses/pp/ParallelProcessing_9... · Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Classes of scheduling

Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

Smart scheduling

Advoiding:

(1) preempting a task when it is inside its critical section

(2) rescheduling tasks that were busy-waiting at the time of

their preemption until the task that is executing the

corresponding critical section releases it.

The problem of “preemption inside spinlock-controlled critical

sections” is solved.

Cache corruption???.

Page 25: Parallel Job Schedulings - University of Technologyhungnq/courses/pp/ParallelProcessing_9... · Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Classes of scheduling

Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

Scheduling in

the NYU Ultracomputer

Tasks can be formed into groups

Tasks in a group can be scheduled in any of the following

ways:

– A task can be scheduled or preempted in the normal manner

– All the tasks in a group are scheduled or preempted simultaneously

– Tasks in a group are never preempted.

In addition, a task can prevent its preemption irrespective of

the scheduling policy (one of the above three) of its group.

Page 26: Parallel Job Schedulings - University of Technologyhungnq/courses/pp/ParallelProcessing_9... · Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Classes of scheduling

Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

Affinity based scheduling

Policity: a tasks is scheduled on the processor where it last

executed [Lazowska and Squillante]

Alleviating the problem of cache corruption

Problem: load imbalance

Page 27: Parallel Job Schedulings - University of Technologyhungnq/courses/pp/ParallelProcessing_9... · Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Classes of scheduling

Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

Threads

Processor sets: disjoin

Processors in a processor set is assigned a subset of threads

for execution.

– Priority scheduling: LQ, GQ(0),…,GQ(31)

– LQ and GQ(0-31) are empty: the processor executes an special idle

thread until a thread becomes ready.

– Preemption: if an equal or higher priority ready thread is present

Scheduling in the Mach OS

0

1

31

P0

P1

Pn

Global

queue

(GQ)

Local

queue

(LQ)