Scheduling Mixed Parallel Applications with Reservations Henri Casanova Information and Computer...
-
Upload
cael-keedy -
Category
Documents
-
view
215 -
download
0
Transcript of Scheduling Mixed Parallel Applications with Reservations Henri Casanova Information and Computer...
Scheduling Mixed Parallel Applications with Reservations
Henri CasanovaInformation and Computer Science Dept.
University of Hawai`i at Manoa
Mixed Parallelism Both task- and data-parallelism
“Malleable tasks with precedence constraints”
. . .
time
procs
Mixed Parallelism Mixed parallelism arises in many
applications, many of them scientific workflows
Example: Image processing applications that apply a graph of data-parallel filters e.g., [Hastings et al., 2003]
Many workflow toolkits support mixed-parallel applications e.g., [Stef-Praun et al., 2007], [Kanazawa, 2005],
[Hunold et al., 2003]
Mixed-Parallel Scheduling Mixed-parallel scheduling has been studied by
several researchers NP-hard, with guaranteed algorithms [Lepere et al.,
2001] [Jansen et al., 2006]
Several heuristics have been proposed in the literature One-step algorithms [Boudet et al., 2003] [Vydyanathan
et al., 2006] • Task allocations and task mapping decisions happen
concurrently Two-step algorithms [Radulescu et al., 2001] [Bandala et
al., 2006] [Rauber et al., 1998] [Suter et al. 2007]• First, compute task allocations• Second, map tasks to processors using some standard list-
scheduling approach
The Allocation Problem We can give each task very few (one?)
processors We have tasks that run for a long time But we can do a lot of them in parallel
We can give each task many (all?) processors We have tasks that run quickly, but typically with
diminishing return due to <1 parallel efficiencies But we can’t run many tasks in parallel
Trade-off: parallelism and task execution times Question: How do we achieve a good trade-off?
Critical Path and Work
time
proc
esso
rs
Two constraints: Makespan * #procs > total work Makespan > critical path length
total work = sum of rectangle surfacescritical path length = execution time of the longest path in the DAG
Work vs. CP Trade-off
task allocations largesmall
critical pathtotal work /
# procs
best lower bound on makespan
The CPA 2-Step Algorithm Original Algorithm [Radulescu et al., 2001]
For a homogeneous platform Start by allocating 1 processor to all tasks Then pick a task and increase its allocation by
1 processor• Picking the task that benefits the most from one
extra processor, in terms of execution time Repeat until the critical path length and the
total work / # procs become approximately equal
Improved Algorithm [Suter et al., 2007] Uses an empirically better stopping criterion
Presentation Outline
Mixed-Parallel Scheduling
The Scheduling Problem with Reservations
Models and Assumptions
Algorithms for Minimizing Makespan
Algorithms for Meeting a Deadline
Conclusion
Batch Scheduling and Reservations
Platforms are shared by users, today typically by batch schedulers
Batch schedulers have known drawbacks non-deterministic queue waiting times
In many scenarios, one needs guarantees regarding application completion times
As a result, most batch schedulers today support advance reservations: One can acquire reservations for some number of
processors and for some period of time
Reservations
time
proc
esso
rs
We have to schedule around the holesin the reservation schedule
Reservations
time
proc
esso
rs
One reservation per task
Complexity The makespan minimization problem is NP-hard
at several levels (and thus also for meeting a deadline) Mixed-parallel scheduling is NP-hard
• Guaranteed algorithms [Lepère et al., 2001] [Jansen et al., 2006]
Scheduling independent tasks with reservations is NP-hard and unapproximable in general [Eyraud-Dubois et al., 2007]
• Guaranteed algorithms with restrictions
Guaranteed algorithms for mixed-parallel scheduling with reservations are open
In this work we focus on developing heuristics
Presentation Outline
Mixed-Parallel Scheduling
The Scheduling Problem with Reservations
Models and Assumptions
Algorithms for Minimizing Makespan
Algorithms for Meeting a Deadline
Conclusion
Models and Assumptions Application
We assume that the application is fully specified and static• Conservative reservations can be used to be safe
Random DAGs are generated using the method in [Suter et al., 2007]
Data-parallelism is modeled based on Amdahl’s law Platform
We assume that the reservation schedule does not change while we compute the schedule
We assume that we know the reservation schedule• Sometimes not enabled by cluster administrators
We ignore communication between tasks• Since a parent task may complete well before one of its children can
start, data must be written to disk anyway• Can be modeled via task execution time and/or Amdahl’s law
parameter
Minimizing Makespan Natural approach: adapt the CPA algorithm
It’s a simple algorithm:• First phase: compute allocations• Second phase: list-scheduling
Problem: Allocations are computed without considering
reservations Considering reservations would involve considering
time, which is only done in the second phase Greedy Approach:
Sort the tasks by decreasing bottom-level For each task in this order, determine the best feasible
processor allocation• i.e., the one that has the earliest completion time
Example
time
proc
esso
rs
CBApossible task configurations:
D
AB
C
D
B
Computing Bottom-Levels Problem:
Computing bottom levels (BL) requires that we know task execution times
Task execution times depend on allocations But we compute the allocations after using the bottom levels
We compare four ways to compute BLs use 1-processor allocations use “all”-processor allocations use CPA-computed allocations, using all processors use CPA-computed allocations, using historical average number
of non-reserved processors We find that the 4th method is marginally better
wins in 78.4% of our simulations (more details on simulations later)
All results hereafter use this method for computing BLs
Bounding Allocations A known problem with such a greedy
approach is that allocations are too large reduction in parallelism ends up being
detrimental to makespan Let’s try to bound allocations Three methods
BD_HALF: bound to half of the processors BD_CPA: bound by allocations in the CPA
schedule computed using all processors BD_CPAR: bound by allocations in the CPA
schedule computed using the historical average number of non-reserved processors
Reservation Schedule Model? We conduct our experiments in simulation
cheap, repeatable, controllable We need to simulate environments for
given reservation schedules Question: what does a typical reservation
schedule look like? Answer: we don’t really know yet
There is no “reservation schedule” archive Let’s look at what people have done in the
past...
Synthetic Reservation Schedules
We have schedules of batch jobs e.g., “parallel workload archive”, by D. Feitelson
Typical approach, e.g., in [Smith et al., 2000] Take a batch job schedule Mark some jobs as “reserved” Remove all other jobs
Problem: the amount of reservation is approximately constant, while in the real world we expect it to be approximately decreasing And we see it to behave in this way in a real-world 2.5-
year trace from the Grid5K platform We should generate reservation schedules where
the amount of reservation decreases with time
Synthetic Reservation Schedules Three methods to “drop” reservations after the simulated
application start time Linearly or exponentially
• so that there are no reservations after 7 days Based on job submission time
Preliminary evaluations indicate that the exponential method leads to schedules that are more correlated to the Grid5K data For 4 logs from the “parallel workload archive”
But this is not conclusive because we have only one (good) data set at this point
We run simulations with 4 logs, the 3 above methods, and with the Grid5K data
Bottom-line for this work: we do not observe discrepancies in our results for our purpose regarding any of the above
Simulation Procedure We use 40 application specifications
DAG size, width, regularity, etc. 20 samples
We use 36 reservation schedule specifications batch log, generation method, etc. 50 samples
Total: 1,440 x 1,000 = 1,440,000 experiments Two metrics:
Makespan CPU-hour consumptions
Simulation Results
Algorithm
Makespan CPU-hours
avg. deg. from best
# of wins avg. deg. from best
# of wins
BD_ALL 33.75% 36 42.48% 0
BD_HALF 28.38% 3 37.83% 1
BD_CPA 0.29% 1,026 0.75% 6
BD_CPAR 0.21% 386 0.00% 1,434
Similar results for Grid5K reservation schedules
Presentation Outline
Mixed-Parallel Scheduling
The Scheduling Problem with Reservations
Models and Assumptions
Algorithms for Minimizing Makespan
Algorithms for Meeting a Deadline
Conclusion
Meeting a Deadline A simple approach for meeting a deadline is to
simply schedule backwards from the deadline Picking tasks by increasing bottom-levels
The way to be as safe as possible is to find for each task the feasible allocation that starts as late as possible given that: The exit task must complete before the deadline The task must complete before all of its children begin
Let’s see this on a simple example
Meeting a Deadline Example
E
BAD
C
ED
CBA
time
procs
Task 1
Task 2
possible Task 1 configurations
possible Task 2 configurations
Meeting a Deadline Example
time
proc
esso
rs
deadline
A
E
BAD
C
Meeting a Deadline Example
time
proc
esso
rs
deadline
B
E
BAD
C
Meeting a Deadline Example
time
proc
esso
rs
deadline
C
E
BAD
C
Meeting a Deadline Example
time
proc
esso
rs
deadline
D
E
BAD
C
Meeting a Deadline Example
time
proc
esso
rs
deadline
E
E
BAD
C
Meeting a Deadline Example
time
proc
esso
rs
deadline
Ta
sk 2
E
BAD
C
Meeting a Deadline Example
time
proc
esso
rs
deadline
Ta
sk 2
A
ED
CBA
Meeting a Deadline Example
time
proc
esso
rs
deadline
Ta
sk 2
B
ED
CBA
Meeting a Deadline Example
time
proc
esso
rs
deadline
Ta
sk 2
C
ED
CBA
Meeting a Deadline Example
time
proc
esso
rs
deadline
Ta
sk 2
D
ED
CBA
Meeting a Deadline Example
time
proc
esso
rs
deadline
Ta
sk 2
E
ED
CBA
Meeting a Deadline Example
time
proc
esso
rs
deadline
Ta
sk 2
ED
CBA
Ta
sk 1
Algorithms We can employ the same techniques for
bounding allocations as for the makespan minimization algorithms BD_ALL, BD_HALF, BD_CPA, BD_CPAR
Problem: the algorithms do not consider the tightness of the deadline If the deadline is loose, the above algorithms will
consume unnecessarily high numbers of CPU-hours For a very loose deadline there should be no data-
parallelism, and thus no parallel efficiency loss due to Amdahl’s law
Question: How can we reason about deadline tightness?
Deadline Tightness For each task we have a choice of allocations:
Ones that use too many processors may be wasteful Ones that use too few processors may be dangerous
Idea: Consider the CPA-computed schedule assuming an
empty reservation schedule• Using all processors, or the historical average number of non-
reserved processors Determine when the task would start in that schedule,
i.e., at which fraction of the overall makespan Pick the allocation that allows the task to start at the
same fraction of the time interval between “now” and the deadline
Matching the CPA schedule
CPASchedule
time
proc
esso
rsq procs
a b
Matching the CPA schedule
CPASchedule
time
proc
esso
rsq procs
a b
Schedulewith
Reservation
time
proc
esso
rs
p
c d
task “deadline”
Matching the CPA schedule
CPASchedule
time
proc
esso
rsq procs
a b
Schedulewith
Reservation
time
proc
esso
rs
p
c d
Pick the cheapest allocation such that: b / (a+b) > d / (c+d)
task “deadline”
Simulation Experiments We call this new approach “resource conservative” (RC) We conduct simulation similar to those for the makespan
minimization algorithms Issue: the RC approach can be in trouble when it tries to
schedule the first tasks if the reservation schedule is non-stationary and/or tight could be addressed via some tunable parameter (e.g., pick an
allocation that starts at least x% after the scaled CPA start time) We do not use such a parameter in our results
We use two metrics: Tightest deadline achieved
• Necessary because deadline tightness depends on instance• Determined via binary search
CPU-hours consumption for a deadline that’s 50% later than the tightest deadline
Simulation Results
Algorithm Tightest deadline
(average degradation from best)
CPU-hours consumed
for a loose deadline
Reservation schedule Reservation schedule
sparse medium tight Grid5K sparse medium tight Grid5K
BD_ALL 178% 175% 188% 227% 3556 3486 3768 2006
BD_CPAR 6.52% 6.44% 6.91% 8.38% 231 236 243 179
RC_CPA 13.17% 13.27% 17.36% 19.51% 6.39 6.80 7.98 2.15
RC_CPAR 4.12% 4.27% 8.26% 15.14% 0.16 0.15 0.16 0.09
Conclusions Makespan minimization
Bounding task allocations based on the CPA schedule works well
Meeting a deadline Using the CPA schedule for determining task start
times works well, at least when the reservation schedule isn’t to tight
• Some tuning parameter may help for tight schedules• Or, one can use the same approach as for makespan
minimization but backwards
In both cases using the historical number of unreserved processors leads to marginal improvements
Possible Future Directions Use a recent one-step algorithm instead of
CPA iCASLB [Vydyanathan, 2006]
Experiments in a real-world setting What kind of interface should a batch
scheduler expose if the full reservation schedule must remain hidden?
Reservation schedule archive Needs to be a community effort
Scheduling Mixed-Parallel Applications with Advance Reservations, Kento Aida and Henri Casanova, to appear in Proc. of HPDC 2008
Questions?