Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications
description
Transcript of Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications
![Page 1: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.fdocuments.us/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/1.jpg)
Active and Accelerated Learning of Cost Models for Optimizing Scientific
Applications
Piyush Shivam, Shivnath Babu, Jeffrey Chase
Duke University
![Page 2: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.fdocuments.us/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/2.jpg)
C3
C1
C2
Site A
Site B
Site C
Task scheduler
Task workflow•A network of clusters or grid
sites
Networked Computing Utility
•Each site is a pool of heterogeneous resources
•Jobs are task workflows
•Challenge: choose good resource assignments for the jobs
![Page 3: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.fdocuments.us/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/3.jpg)
C3C1
C2
Site A
Site B
Site C
home file server
P1
P2P3
• A workflow with a single task
Example: Assigning Resources to Run Tasks
P1 Site A Site A
• Task input data at Site A
• Execution plan Ξ Resource assignment
P2 Site B Site A
P3 Site B Site B
Plan CPU Storage
![Page 4: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.fdocuments.us/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/4.jpg)
Plan Selection Problem
Choose Best Plan
Plans CPU Storage
P1 Site A Site A
P2 Site B Site A
… … …
Task workflow
Plan Enumeration
Cost
T1
T2
…
Cost: Plan Execution
Time
Challenge: Need cost models to estimate plan execution time
![Page 5: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.fdocuments.us/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/5.jpg)
Generating Cost Models is Hard
• Non-declarative
– Scientific workflow tasks are usually scripts (matlab, perl)
– Such tasks are not database operators like join or select
– Hence: task is a black box with no prior knowledge
• Heterogeneous resources
– Computational grid setting
– Performance varies a lot across resource assignments
• Data dependency
– Performance can vary significantly based on properties of input data & parameters to scripts
![Page 6: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.fdocuments.us/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/6.jpg)
Problem Setting• Scientific workflows at DSCR (Duke Shared Cluster
Resource)
• Important scientific workflows are run repeatedly
– Opportunity to observe & learn task behavior
– Better plan selection for subsequent runs
• Sequential scientific workflows
– Each task runs on a single node
– >90% of workflows at DSCR are sequential
![Page 7: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.fdocuments.us/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/7.jpg)
NIMO SystemNonInvasive Modeling for
Optimization
NIMO learns cost models for task workflows
– End-to-end cost models
• Incorporate properties of tasks, resources, & data
– Non-invasive
• No changes to tasks
– Automated and active
• Automatically collects training data for learning cost models
C3
C1
C2
Site A
Site B
Site C
Scheduler NIMO
NIMO SystemNonInvasive Modeling for
Optimization
![Page 8: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.fdocuments.us/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/8.jpg)
NIMO Fills a Gap
• WorkFlow Management Systems (WFMSs)
– WFMSs use database technology for managing all aspects of scientific workflows [Liu ‘04, Shankar ‘05]
• Batch scheduling systems
– Knowledge of plan execution time is assumed for optimizing resource assignments [Casanova ‘00, Phan ‘05, Kelly ‘03]
NIMO generates cost models for these systems
![Page 9: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.fdocuments.us/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/9.jpg)
Roadmap
• Cost models
• NIMO: active learning of cost models
• Experimental evaluation
• Related work
• Conclusions
• Future work
![Page 10: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.fdocuments.us/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/10.jpg)
Cost Model
Task
Executiontime
Resource assignment
Cost Modelfor Task Input data
Total workflow execution time can be derived usingthe cost models for individual tasks
Task workflow
![Page 11: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.fdocuments.us/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/11.jpg)
Oa
(compute
occupancy)
Os
(stall occupancy)
Task Cost Model
compute phase(compute resource busy)
stall phase(compute resource
stalled on I/O)
Od
(storage
occupancy)
On
(network
occupancy)
+ + )(T = D *totaldata
exec.time
occupancy: average time spent per unit of data
![Page 12: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.fdocuments.us/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/12.jpg)
Cost ModelTask
Executiontime
Resourceassignment
Cost Model
Input dataT = D * (Oa + On + Od)
Resource profile
Data profile
Task profile
![Page 13: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.fdocuments.us/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/13.jpg)
Learning Cost Models
Learning the cost model = Learning profiles + Learning predictors
![Page 14: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.fdocuments.us/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/14.jpg)
Independent variables
Resource profile ( )
Dataprofile ( )
Statistical Learningof Predictors
Dependent variables
Ex: Learn each predictor as a regression modelfrom the training data
![Page 15: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.fdocuments.us/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/15.jpg)
Challenges in Learning
• Cost of sample acquisition
• Coverage of system operating range
• Curse of dimensionality
– Suppose: 10 profile attributes X 10 values per attribute, and 5 minutes for a task run (sample) We sample 1% of space and build cost model
Passive learning
Elapsed Time
Accuracy of
currentbest
model
951 years!
Active & AcceleratedLearning
Best accuracy possible
![Page 16: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.fdocuments.us/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/16.jpg)
Active (and Accelerated) Learning
• Which predictors are important?
• Which profile attributes should each predictor have?
• What values to consider for each profile attribute during training?
Resource profile Data profile
![Page 17: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.fdocuments.us/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/17.jpg)
WANemulator(nistnet)
NIMO workbench
Training setdatabase
Active &Accel.
learning
C3
C1
C2
Site A
Site B
Site C
Scheduler
NIMO System
Taskprofiler
Resourceprofiler
Run standard benchmarks
Dataprofiler
![Page 18: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.fdocuments.us/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/18.jpg)
Active Learning Algorithm
Initialization
While( ) {
}
![Page 19: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.fdocuments.us/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/19.jpg)
• Relearn predictors with the new set of training samples
• Compute current prediction error of each predictor
– Fixed test set
– Cross-validation
Active Learning Algorithm
Initialization
While( ) {
}
Pick a new assignment
Run task on chosen assignment
Relearn predictors
Relearn Predictors
10ms256M1GHz 1G512MB 6 8T44
![Page 20: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.fdocuments.us/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/20.jpg)
Active Learning Algorithm
Initialization
While( ) {
}
Run task on chosen assignment
Relearn predictors
10ms256M1GHz 1G512MB 6 8T44
Choose a predictor to refine
Choose attributes for the predictor
Choose attribute values for the run
Predictor Choice• Predictors – fa, fn, fd, fD
• Order predictors + Traverse this order
– Ex: relevance-based order (Plackett-Burman)
– Ex: choose predictor with current max. error
![Page 21: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.fdocuments.us/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/21.jpg)
Active Learning Algorithm
Initialization
While( ) {
}
Run task on chosen assignment
Relearn predictors
10ms256M1GHz 1G512MB 6 8T44
Choose a predictor to refine
Choose attributes for the predictor
Choose attribute values for the run
Attribute Choice
• Each predictor takes profile attributes as input
• Not all attributes are equally relevant
• Order attributes + Traverse this order
![Page 22: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.fdocuments.us/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/22.jpg)
Active Learning Algorithm
Initialization
While( ) {
}
Run task on chosen assignment
Relearn predictors
10ms256M1GHz 1G512MB 6 8T44
Choose a predictor to refine
Choose attributes for the predictor
Choose attribute values for the run
Value Choice
• Cover the operating range of attributes
• Expose main interactions with other attributes
![Page 23: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.fdocuments.us/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/23.jpg)
Experimental Results
• Biomedical workflows (from DSCR)
– BLAST, fMRI, NAMD, CardioWave
– Single task workflows
• Plan space in the heterogeneous networked utility
– 5 CPU speeds, 6 Network latencies, 5 Memory sizes
– 5 X 6 X 5 = 150 resource plans
• Goal: Converge quickly to a fairly-accurate cost model
– We use regression models for the predictors
– Model validation details in previous work (ICAC 2005)
![Page 24: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.fdocuments.us/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/24.jpg)
Performance Summary
• Error: Mean absolute % error in predicted execution time• A separate test set for evaluating the error
![Page 25: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.fdocuments.us/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/25.jpg)
BLAST Application: Predictor Choice
![Page 26: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.fdocuments.us/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/26.jpg)
BLAST Application: Attribute Choice
![Page 27: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.fdocuments.us/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/27.jpg)
Related Work
• Workflow Management Systems (WFMSs)
– [Shankar ’05, Liu ’04 etc.]
• Performance prediction in scientific applications
– [Carrington ’05, Rosti ’02, etc.]
• Learning cost models using statistical techniques
– [Zhang ’05, Zhu ’96, etc.]
• NIMO is end-to-end, noninvasive, and active (acquires model learning data automatically)
![Page 28: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.fdocuments.us/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/28.jpg)
Conclusions
• NIMO:
– Learns cost models for scientific workflows
– Noninvasive and end-to-end
– Active and accelerated learning: Learns accurate cost models quickly
– Fills a gap in Workflow Management Systems
![Page 29: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.fdocuments.us/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/29.jpg)
• NIMO + SHIRAKO
– A policy-based resource-leasing system that can slice-and-dice virtualized resources
• NIMO + Fa
– Processing system-management queries (e.g., root-cause diagnosis, forecasting performance problems, capacity-planning)
C3
C1
C2
Site A
Site B
Site C
Scheduler NIMO
Future Work
![Page 30: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.fdocuments.us/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/30.jpg)
Backup Slides for Explanation
![Page 31: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.fdocuments.us/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/31.jpg)
See Paper for Details of Steps• Each algorithm step has sub-algorithms
• Example: Choosing the predictor to refine in current step
– Goal: learn most relevant predictors first
– Static Vs. dynamic ordering
• Static:
– Define total order: a priori or using estimates of influence (Plackett-Burman)
– Traverse the order: round-robin Vs. improvement-threshold-based
• Dynamic: choose the predictor with maximum current prediction error
![Page 32: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.fdocuments.us/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/32.jpg)
Active and Accelerated Learning
![Page 33: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.fdocuments.us/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/33.jpg)
Latency hiding
![Page 34: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.fdocuments.us/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/34.jpg)
Saturation