Enabling Interactive Multi-Experiment Computational Studies through a Permeable Runtime...

40
Enabling Interactive Multi-Experiment Computational Studies through a Permeable Runtime System- Application Interface Ph.D. Thesis Proposal Siu Yau Jun 2006

Transcript of Enabling Interactive Multi-Experiment Computational Studies through a Permeable Runtime...

Enabling Interactive Multi-Experiment

Computational Studies through a Permeable Runtime System-Application Interface

Ph.D. Thesis Proposal

Siu Yau Jun 2006

Outline

• Background & Motivation– Computational Studies, Related Work

• Computational Systems– Thesis Statement, Observations

• Experimental System (SimX)– Components & Preliminary results

• Further Work

• Project Timeline

Background & Motivation

• Computer Simulation has become an integral part of the scientific method

• Wide-spread use of Computational Studies in science and engineering

• Much work done to speed up individual simulations

• But Computational Studies can involve 100s, 1000s, or 10000s of simulations. . .

Computational Studies

• Computational Studies– Simulation code, run multiple times– Parameter Space: possible inputs– Observation Space: measured metrics– Objectives

• Identify points on Parameter Space meets objectives in the Observation Space

• Goal: Interactive computational studies

Bridge Design

• Simulation: Thin plate FEM bridge deformation• Parameter: Columns placement• Observation: Construction cost, Max deflection• Objectives: Pareto-optimal points: inferior to no

other points in both observation metrics

Defibrillator Design

• Simulation: Torso conductivity FEM

• Parameter: Electrode placement, shock strength

• Observation: Damage, Uniformity, Effectiveness

• Objectives: Pareto-optimal points

Response Graph

• Simulation: Transient analysis on 2D frame with time-periodic Boundary Condition

• Parameter: Frequency• Observation: Amplitude• Objective: Frequency-

Response graph

Related Work

• Parameter Sweep schedulers– Condor: Distributed batch system– Globus: Toolkit deployed on grid resources

for automatic resource discovery and workflow scheduling

– Virtual Instrument: Interactively steerable parameter sweep application (interaction limited to parameter space selection)

– Nimrod/O: Parametised Simulations on distributed computers with guided search

Related Work

• Computational Steering Infrastructures– Falcon: On-line monitoring and steering of

large-scale parallel programs– CUMULVS: Infrastructure for steering,

monitoring, and checkpointing– SCIRun: Interactive computational steering

environment using dataflow model – CSE: Steering and monitoring of

computational processes on remote computers

Outline

• Background & Motivation– Computational Studies, Related Work

• Computational Systems– Thesis Statement, Observations

• Experimental System (SimX)– Components & Preliminary results

• Further Work

• Project Timeline

Computational Systems

• Treating each individual experiment as “black box”– Individual experiment runs treated

separately (as in parameter sweeps)– Interactivity limited to individual

experiments (as in steering infrastructures)

• Thesis question: Without the “black box” restriction, can one steer 100s or 1000s of experiments simultaneously?

Thesis Statement

• By exposing application level domain knowledge to the runtime system through a more permeable application-system interface, one can bring multi-experiment computational studies to interactive speed and enable steering of entire computational studies.

Strategies

• Reducing runtime of experiments– Checkpointing: use the results of previously-

run experiment(s) to jump-start a current one– Precision tradeoff: less precise experiments in

return of higher throughput

• Reducing runtime of group of experiments– Active Sampling: issuing less experiments– Experiment Scheduling: improve resource

utilisation

Checkpointing

• E.g. of checkpoint reuse: use the previous result as first guess to iterative method

• Reduces runtime of individual experiments

A

C

B

Sup

port

1

Support 2

Precision Tradeoff

• Use lower resolution mesh

• Use higher residual tolerance in iterative solvers

• Reduces runtime of individual experiments

• When is it trade-off permissible? – The user only needs a fuzzy result – The system decides (e.g., a point far from

Pareto boundary)

Active Sampling

• Running only a subset of experiments in the parameter space

• Reduces number of experiments needed

• Which strategy depends on the study, e.g.:– Sweep, Active, Guided search for Pareto

Frontier– Graph plotting for Frequency response

Experiment Scheduling

• Effective scheduling depends on accurate time-to-completion estimates

• Use dynamically collected data to improve time-to-completion estimates, e.g., no. of iterations needed

• Incorporate improved estimates to generate experiments’ execution schedule

Outline

• Background & Motivation– Computational Studies, Related Work

• Computational Systems– Thesis Statement, Observations

• Experimental System (SimX)– Components & Preliminary results

• Further Work

• Project Timeline

Experimental System

• Parallel System software for Interactive Multi-Experiment Computation Studies (SIMECS, or SimX)

• Performs the Bridge study, Defibrillator study, and the Frequency-response plotting study

• To evaluate the checkpointing, sampling, and resource allocation techniques

SimX Architecture

Prototype implemented and presented in IPDPS 06

SimX: Bridge Experiment

• 4 stages• Each experiment

requires O(100ms)• 24KB per

checkpoint

Stage 1: 1735 sims Stage 2: 4950 sims

Stage 3: 18632 sims Stage 4: 75351 sims

SimX: Preliminary Results

• Experiments on bridge study– Partitioned Object Space Server: Scales to

128 workers– Active Sampler reduces # of experiments to

1727, 2584, 4243, 4526 (from 1735, 4950, 18632, 75351)

– Using checkpoint: shows 10x improvement in run time, each experiment reduced to O(10ms)

SimX: Preliminary Results

Effect of Checkpoint on Active Sampler

0

0.5

1

1.5

2

2.5

3

Level 1 Level 2 Level 3 Level 4

Refinement Level

Log(

Seco

nds

to L

evel

)

Active Sampler

Active Samplerw/Checkpoint

Effect of Checkpoint on Grid Sampler

0

0.5

1

1.5

2

2.5

3

3.5

4

Level 1 Level 2 Level 3 Level 4

Refinement Level

Log(

Secc

onds

to L

evel

)

Grid Sampler

Grid Samplerw/Checkpoint

Runtime of Bridge study on SimX

Outline

• Background & Motivation– Computational Studies, Related Work

• Computational Systems– Thesis Statement, Observations

• Experimental System (SimX)– Components & Preliminary results

• Further Work

• Project Timeline

SimX Architecture

Active Sampler

• Explore different types of samplers for different applications

• 4 types of samplers:– Grid sampler, Active sampler, Guided Search,

Graph Plotting

• Thesis Goal: Identify strengths and limitations of each active sampler type

Sweep Sampling

Initial Grid First Refinement 2nd Refinement

• Issues experiment on progressively finer grid on the Parameter Space

Active Sampling

Initial Grid

1st level results

First Refinement

2nd level results

2nd Refinement

3rd level results

• Only refines on Pareto Frontier

Guided Search

• Start from random points, follow the better-performing neighbors

Graph Plotting

• Uniform sampling first, then fill in details

SimX Architecture

Resource Allocator

• Maps jobs on task list on to processors – how many processors– which processors– which checkpoints to use

• Resources considered: Network bandwidth and processor time

• Thesis goal: Quantify benefits of application-level knowledge in resource scheduling Vs black-box approach

Resource Allocator

• Application level knowledge used in:

• Runtime estimation– dynamically collected data – performance model supplied by user– Combination of empirical and analytical

• Network bandwidth estimation: – logp model, managed by shared object layer

Resource Allocator

• Scheduling Heuristics– Greedy– Fair share– Locality

• Current status: FIFO; always use one closest checkpoint

SimX Architecture

Shared Object Layer

• Implementation options– Server/Client Vs Integrated– Single Server Vs Partitioned Server– Caching Vs no Caching– Client Caching Vs Cooperative Caching

• Thesis goal: Investigate implementation options and how they affect the overall performance of computational studies

Outline

• Background & Motivation– Computational Studies, Related Work

• Computational Systems– Thesis Statement, Observations

• Experimental System (SimX)– Components & Preliminary results

• Further Work

• Project Timeline

Project Timeline

• End of summer 06– Expand the application base of SimX system:

• Port SimX as part of SCIRun components to run the Defibrillator application

• Evaluate the performance of SimX on Defibrillator application

• Add error control, and space-partitioned SISOL server as needed

Project Timeline

• End of fall 06– Evaluate SimX capability to handle multiple

parallel simulations• Implement parallel bridge and defibrillator

simulations• Multi-server SISOL implementation

• End of spring 07– Implement and evaluate infrastructure for

performance prediction

Project Timeline

• End of fall 07– Evaluate sampler policies (Grid Vs Active Vs

Guided search Vs graph plotter) on bridge, defibrillator, and graph plotting applications

– Evaluate performance of resource allocation heuristics (FIFO Vs Greedy Vs Fairshare Vs Locality)