Enabling Interactive Multi-Experiment Computational Studies through a Permeable Runtime...
-
Upload
merilyn-conley -
Category
Documents
-
view
218 -
download
0
Transcript of Enabling Interactive Multi-Experiment Computational Studies through a Permeable Runtime...
Enabling Interactive Multi-Experiment
Computational Studies through a Permeable Runtime System-Application Interface
Ph.D. Thesis Proposal
Siu Yau Jun 2006
Outline
• Background & Motivation– Computational Studies, Related Work
• Computational Systems– Thesis Statement, Observations
• Experimental System (SimX)– Components & Preliminary results
• Further Work
• Project Timeline
Background & Motivation
• Computer Simulation has become an integral part of the scientific method
• Wide-spread use of Computational Studies in science and engineering
• Much work done to speed up individual simulations
• But Computational Studies can involve 100s, 1000s, or 10000s of simulations. . .
Computational Studies
• Computational Studies– Simulation code, run multiple times– Parameter Space: possible inputs– Observation Space: measured metrics– Objectives
• Identify points on Parameter Space meets objectives in the Observation Space
• Goal: Interactive computational studies
Bridge Design
• Simulation: Thin plate FEM bridge deformation• Parameter: Columns placement• Observation: Construction cost, Max deflection• Objectives: Pareto-optimal points: inferior to no
other points in both observation metrics
Defibrillator Design
• Simulation: Torso conductivity FEM
• Parameter: Electrode placement, shock strength
• Observation: Damage, Uniformity, Effectiveness
• Objectives: Pareto-optimal points
Response Graph
• Simulation: Transient analysis on 2D frame with time-periodic Boundary Condition
• Parameter: Frequency• Observation: Amplitude• Objective: Frequency-
Response graph
Related Work
• Parameter Sweep schedulers– Condor: Distributed batch system– Globus: Toolkit deployed on grid resources
for automatic resource discovery and workflow scheduling
– Virtual Instrument: Interactively steerable parameter sweep application (interaction limited to parameter space selection)
– Nimrod/O: Parametised Simulations on distributed computers with guided search
Related Work
• Computational Steering Infrastructures– Falcon: On-line monitoring and steering of
large-scale parallel programs– CUMULVS: Infrastructure for steering,
monitoring, and checkpointing– SCIRun: Interactive computational steering
environment using dataflow model – CSE: Steering and monitoring of
computational processes on remote computers
Outline
• Background & Motivation– Computational Studies, Related Work
• Computational Systems– Thesis Statement, Observations
• Experimental System (SimX)– Components & Preliminary results
• Further Work
• Project Timeline
Computational Systems
• Treating each individual experiment as “black box”– Individual experiment runs treated
separately (as in parameter sweeps)– Interactivity limited to individual
experiments (as in steering infrastructures)
• Thesis question: Without the “black box” restriction, can one steer 100s or 1000s of experiments simultaneously?
Thesis Statement
• By exposing application level domain knowledge to the runtime system through a more permeable application-system interface, one can bring multi-experiment computational studies to interactive speed and enable steering of entire computational studies.
Strategies
• Reducing runtime of experiments– Checkpointing: use the results of previously-
run experiment(s) to jump-start a current one– Precision tradeoff: less precise experiments in
return of higher throughput
• Reducing runtime of group of experiments– Active Sampling: issuing less experiments– Experiment Scheduling: improve resource
utilisation
Checkpointing
• E.g. of checkpoint reuse: use the previous result as first guess to iterative method
• Reduces runtime of individual experiments
A
C
B
Sup
port
1
Support 2
Precision Tradeoff
• Use lower resolution mesh
• Use higher residual tolerance in iterative solvers
• Reduces runtime of individual experiments
• When is it trade-off permissible? – The user only needs a fuzzy result – The system decides (e.g., a point far from
Pareto boundary)
Active Sampling
• Running only a subset of experiments in the parameter space
• Reduces number of experiments needed
• Which strategy depends on the study, e.g.:– Sweep, Active, Guided search for Pareto
Frontier– Graph plotting for Frequency response
Experiment Scheduling
• Effective scheduling depends on accurate time-to-completion estimates
• Use dynamically collected data to improve time-to-completion estimates, e.g., no. of iterations needed
• Incorporate improved estimates to generate experiments’ execution schedule
Outline
• Background & Motivation– Computational Studies, Related Work
• Computational Systems– Thesis Statement, Observations
• Experimental System (SimX)– Components & Preliminary results
• Further Work
• Project Timeline
Experimental System
• Parallel System software for Interactive Multi-Experiment Computation Studies (SIMECS, or SimX)
• Performs the Bridge study, Defibrillator study, and the Frequency-response plotting study
• To evaluate the checkpointing, sampling, and resource allocation techniques
SimX: Bridge Experiment
• 4 stages• Each experiment
requires O(100ms)• 24KB per
checkpoint
Stage 1: 1735 sims Stage 2: 4950 sims
Stage 3: 18632 sims Stage 4: 75351 sims
SimX: Preliminary Results
• Experiments on bridge study– Partitioned Object Space Server: Scales to
128 workers– Active Sampler reduces # of experiments to
1727, 2584, 4243, 4526 (from 1735, 4950, 18632, 75351)
– Using checkpoint: shows 10x improvement in run time, each experiment reduced to O(10ms)
SimX: Preliminary Results
Effect of Checkpoint on Active Sampler
0
0.5
1
1.5
2
2.5
3
Level 1 Level 2 Level 3 Level 4
Refinement Level
Log(
Seco
nds
to L
evel
)
Active Sampler
Active Samplerw/Checkpoint
Effect of Checkpoint on Grid Sampler
0
0.5
1
1.5
2
2.5
3
3.5
4
Level 1 Level 2 Level 3 Level 4
Refinement Level
Log(
Secc
onds
to L
evel
)
Grid Sampler
Grid Samplerw/Checkpoint
Runtime of Bridge study on SimX
Outline
• Background & Motivation– Computational Studies, Related Work
• Computational Systems– Thesis Statement, Observations
• Experimental System (SimX)– Components & Preliminary results
• Further Work
• Project Timeline
Active Sampler
• Explore different types of samplers for different applications
• 4 types of samplers:– Grid sampler, Active sampler, Guided Search,
Graph Plotting
• Thesis Goal: Identify strengths and limitations of each active sampler type
Sweep Sampling
Initial Grid First Refinement 2nd Refinement
• Issues experiment on progressively finer grid on the Parameter Space
Active Sampling
Initial Grid
1st level results
First Refinement
2nd level results
2nd Refinement
3rd level results
• Only refines on Pareto Frontier
Resource Allocator
• Maps jobs on task list on to processors – how many processors– which processors– which checkpoints to use
• Resources considered: Network bandwidth and processor time
• Thesis goal: Quantify benefits of application-level knowledge in resource scheduling Vs black-box approach
Resource Allocator
• Application level knowledge used in:
• Runtime estimation– dynamically collected data – performance model supplied by user– Combination of empirical and analytical
• Network bandwidth estimation: – logp model, managed by shared object layer
Resource Allocator
• Scheduling Heuristics– Greedy– Fair share– Locality
• Current status: FIFO; always use one closest checkpoint
Shared Object Layer
• Implementation options– Server/Client Vs Integrated– Single Server Vs Partitioned Server– Caching Vs no Caching– Client Caching Vs Cooperative Caching
• Thesis goal: Investigate implementation options and how they affect the overall performance of computational studies
Outline
• Background & Motivation– Computational Studies, Related Work
• Computational Systems– Thesis Statement, Observations
• Experimental System (SimX)– Components & Preliminary results
• Further Work
• Project Timeline
Project Timeline
• End of summer 06– Expand the application base of SimX system:
• Port SimX as part of SCIRun components to run the Defibrillator application
• Evaluate the performance of SimX on Defibrillator application
• Add error control, and space-partitioned SISOL server as needed
Project Timeline
• End of fall 06– Evaluate SimX capability to handle multiple
parallel simulations• Implement parallel bridge and defibrillator
simulations• Multi-server SISOL implementation
• End of spring 07– Implement and evaluate infrastructure for
performance prediction