Applying Scheduling and Tuning to On-line Parallel Tomography
description
Transcript of Applying Scheduling and Tuning to On-line Parallel Tomography
Applying Scheduling and Tuning to On-line Parallel Tomography
Shava SmallenIndiana University
Henri Casanova, Francine BermanUniversity of California at San DiegoSan Diego Supercomputer Center
2
Outline
1. Introduction to On-line Parallel Tomography
2. Tunable On-line Parallel Tomography
3. User-directed application-level scheduler
4. Experiments
5. Summary
3
What is tomography?
• Tomography: a method for reconstructing the interior of an object from its projections
• National Center for Microscopy and Imaging Research (NCMIR)– Electron Microscopy
Electron Microscope
4
Tomogram of spiny dendrite(Images courtesy of Steve Lamont)
Example
• Compute and data-intensive
• E.g. 2k x 2k dataset (pixels)– 2k units of work (slices)– Total input data size: 976 MB– Total output data size: 9.6
GB– Compute time: ~ 16 days on
a standard workstation
• Off-line1. Data collection2. Data processing3. Data viewing
5
On-line Parallel Tomography
• Provide interactive soft real-time feedback on quality of data acquisition– High tomogram resolution and frequent
refreshes • Efficiency benefits for users and microscope
on-lineparallel
tomography
6
NCMIR Compute Platform
• Distributed multi-user, heterogeneous Grid
network
Blue Horizon (SDSC)1152 procs (AIX, Loadleveler, Maui
Scheduler)
NCMIR clusterSGI Indigo2, SGI Octane (IRIX)
SUN ULTRA, SUN Enterprise (Solaris)
Meteor cluster (SDSC)Pentium III dual procs (Linux)
7
• On-line parallel tomography is a tunable application– [Chang,et al] Availability of alternate configurations
• Resource utilization• Output
• On-line parallel tomography output– Tomogram resolution– Refresh frequency
• Tunability controlled by configuration pair ( f, r ) where– f is the reduction factor (tomogram resolution)– r is the number of projections per refresh (refresh frequency)– E.g. (2,3)
on-lineparallel
tomography reduce(f)
Application Tunability
8
Tunability/Scheduling
• At run-time, we need to find out which configuration pairs are feasible– Flexibility to allow for trade-offs between f and r
• e.g., (2, 3 ) or (3, 2)
– Resource availability– User bounds
• E.g.,– Refresh at least once every 10 minutes– Minimum image resolution 256 x 256 pixels
• A configuration pair is feasible if we can find a corresponding schedule
• We choose an adaptive-scheduling approach
9
Application-Level Scheduler (AppLeS)
AppLeS + application = self-scheduling application
• Enable an application to adaptively schedule its execution on distributed, heterogeneous resources in order to improve performance
• Type of information used:– static
• e.g. application model, network topology, …– dynamic
• e.g. Network Weather Service (NWS) - available CPU, bandwidth, …
User-directed AppLeS
User
generaterequest
displaypairs
adjustrequest
reviewpairs
processrequest
findschedule
executeon-line parallel
tomography
accepts one
rejects all
infeasible
feasible
• User-directed AppLeS– Involves user in
scheduling process– Flexible
slices
preprocessor
worker
worker
worker
worker
worker
writer
On-line Parallel Tomography Architecture
projection
scanlines
Updatetomogram
12
Scheduling Approach
• Constrained optimization problem based on soft real-time execution– compute constraint
• static benchmark, dynamic CPU availability (NWS)– transfer constraint
• topology info (ENV), dynamic bandwidth (NWS)
• Problem is a nonlinear program– Exploit small range of f to reduce to multiple
mixed integer programs which is solved via lp_solve
• approximate solution
13
Experiments
• Goals:– Set 1 – Scheduler Results
• Evaluate scheduler efficacy• Evaluate impact of dynamic resource availability on
scheduler efficacy– Set 2 – Tunability Results
• Evaluate usefulness of tunability
• Simulation– Number of experiments– Repeatability
NCMIR Grid
• Case Study: – week of traces: May 19 – 26, 2001
• CPU availability (NWS)• Bandwidth (NWS)• Node availability (Maui scheduler showbf)
15
Scheduling Strategies
• 4 scheduling strategies
Assumes infinite bandwidth info
Uses dynamic bandwidth info
Assumes dedicated CPU
wwa wwa+bw
Uses dynamic CPU info
wwa+cpu AppLeS
16
• Simulates an execution of on-line parallel tomography
• Uses Simgrid - Casanova [CCGrid’2001]– toolkit for evaluating scheduling algorithms
• tasks• resources modeled using traces
– E.g. Parameter sweep applications [HCW’00]• 2 types of simulations
– Executed at 10 minute intervals• 1004 simulations x 4 schedulers
Simtomo
17
Real trace
0
1
Simulation Types
0
11. Partially trace-driven (perfect load predictions)
12
3
12
3
2. Completely trace-driven (imperfect load predictions)
0
11
2. Completely trace-driven (imperfect load predictions)
0
1
3
2. Completely trace-driven (imperfect load predictions)
0
1 2
18
relative refresh lateness
actual refresh period
• Relative refresh lateness
Performance Metric
expected refresh period (based on r)
19
Scheduling Results (1)(partially trace-driven)
May 19-26, 2001
98%
Importance of dynamic
bandwidth info
20
Scheduling Results (2)(Completely trace-driven)
May 19-26, 2001
57.1%
21
Tunability Results
• How often does the pair change (i.e., tune)– Assume a single user model where user always
chooses pair with lowest f– Find the best pairs throughout simulated week
• Snapshot of Monday May 21st
• On average, pair changed 25% of the time
8:00 9:00 10:00 11:00
(3,1)(2,2) (3,2) (2,2)
22
Summary
• Tunable on-line parallel tomography at NCMIR
• Dynamic resource information improves scheduler efficacy– Dynamic bandwidth information is key
• Case for tunability in a Grid environment
23
Future Work
• Introduce cost – another tunable parameter: (f, r, $)
• More Grid simulations – Traces from various sites across US and
Europe• Generalizing to other applications• Rescheduling• Production use at NCMIR
24
Parallel Tomography at NCMIR
• Embarrassingly parallel
X
Y
slice
specimen
Z
scanlineprojection
projection
scanline
25
Scheduling Latency
• Time to search for feasible triples
1k x 1k 2k x 2k
26