@ r k V ; M r - isi.eduhans/publications/JETAI93.pdf@ r k V ; M r - isi.edu
Condor Week 2005Optimizing Workflows on the Grid1 Optimizing workflow execution on the Grid Gaurang...
-
Upload
easter-shelton -
Category
Documents
-
view
214 -
download
0
Transcript of Condor Week 2005Optimizing Workflows on the Grid1 Optimizing workflow execution on the Grid Gaurang...
Condor Week 2005 Optimizing Workflows on the Grid 1
Optimizing workflow execution on the Grid
Gaurang Mehta - [email protected]
Based on “Optimizing Grid-Based Workflow Execution”
Gurmeet Singh, Carl Kesselman, Ewa Deelman
Submitted to HPDC-05
Condor Week 2005 Optimizing Workflows on the Grid 2
Introduction
Use of workflows on grid is becoming widespread in scientific applications.- Astrophysics- High Energy Physics- Biology etc.
Current focus is on- GUIs for composing workflows- Standardizing workflow specification languages- Mapping of tasks in the workflow for optimizing system
metric- Use of some workflow execution engine to execute the
workflow (DAGMan, GRMS, Triana, Webflow etc) Performance of the workflow execution engine has not
received much attention
Condor Week 2005 Optimizing Workflows on the Grid 3
Workflow Model
Parse the workflow description
Create a ready list of executable tasks
Start the tasks on the resources
Identify resources for the tasks
Select tasks from the ready list
Monitor task completion
Dependency analysis
Update the ready list
Condor Week 2005 Optimizing Workflows on the Grid 4
Workflow Model
The costs of workflow execution are in- Creating and maintaining a ready list
- Resource matching
- Dispatching jobs to resources
These costs can become significant for a fine granularity workflow (the runtimes of jobs are small) due to - Large number of jobs in workflow
- Dependencies between jobs
- Distributed nature of resources
Condor Week 2005 Optimizing Workflows on the Grid 5
Condor as the Workflow Execution Engine
We use Condor as the Workflow Execution Engine. Condor-Glidein is used for provisioning the execution
resources ahead of time. - Resource provisioning allows for experiments to isolate
and examine the workflow execution overheads Based on the workflow execution costs described
earlier, the factors that affect the performance in the context of the Condor system are the following- Scheduling interval (schedd, negotiator)
- Job Dispatch Rate (schedd)
- Job Submission rate (DAGMan, schedd)
Condor Week 2005 Optimizing Workflows on the Grid 6
Montage Workflow Structure
4500 total jobs
890 jobs top level
2600 jobs second level
10 minutes
100 processors
100% efficiency
Condor Week 2005 Optimizing Workflows on the Grid 7
Execution Environment
100 Worker Nodes from NCSA Teragrid cluster
Submit Host
Condor Pool
COLLECTOR
NEGOTIATOR
DAGMan
SCHEDD
Central Manager
STARTD
Condor Week 2005 Optimizing Workflows on the Grid 8
Baseline Condor Performance
Condor Week 2005 Optimizing Workflows on the Grid 9
Scheduling Interval
Negotiation cycle is the process of identifying resources for jobs.
Interval between two successive negotiation cycles is the scheduling interval
Can be controlled in variety of ways- Fixed Scheduling Interval
- Starting negotiation cycle at submission of each job at a rate no greater than 20 seconds
Condor Week 2005 Optimizing Workflows on the Grid 10
Scheduling at Job Submission
30 seconds 5 minutes 10 minutes
Condor Week 2005 Optimizing Workflows on the Grid 11
Fixed Scheduling Interval
30 seconds 5 minutes 10 minutes
Condor Week 2005 Optimizing Workflows on the Grid 12
Effect of Scheduling interval
Condor Week 2005 Optimizing Workflows on the Grid 13
Job Dispatch Rate
Dispatch rate is the rate at which the scheduler can start the jobs on the remote resource
Throttled using the JOB_START_DELAY Default setting of 2 seconds prevents loads on
the submit machine and on the scheduler Artificial delay can be expensive if workflow
contains too many small jobs.
Condor Week 2005 Optimizing Workflows on the Grid 14
Job Dispatch Rate
JSD 0 seconds 1 second 2 second
Condor Week 2005 Optimizing Workflows on the Grid 15
Job submission rate
Rate at which DAGMan submits jobs to the Condor queue.
With a faster dispatch rate, the job submission rate becomes the limiting factor.
Submission rate depends on the dependencies in a workflow.
Restructuring a workflow to reduce dependencies can increase submission rate.
Condor Week 2005 Optimizing Workflows on the Grid 16
Workflow Restructuring
Condor Week 2005 Optimizing Workflows on the Grid 17
DAGMan for each composite job
1 Cluster per level 2 Clusters per level
Condor Week 2005 Optimizing Workflows on the Grid 18
Condor cluster for each composite job
Condor Week 2005 Optimizing Workflows on the Grid 19
Conclusion
Condor is a high throughput system and the default configuration works well for long running jobs.
We are interested in high performance using Condor for fine granularity workflows.
It is possible to improve the performance by modifying the configuration parameters and using Condor features like clustering.
90% reduction in the workflow completion time for the Montage fine granularity workflow.
The reduction possible depends on the workflow structure, granularity and number of available resources
Condor Week 2005 Optimizing Workflows on the Grid 20
Future Work
Investigate the tradeoff between the resource requirements and the workflow completion time.
Investigate the effect of granularity on the workflow performance.
Read “Optimizing Grid-Based Workflow Execution” by Gurmeet Singh, Carl Kesselman, Ewa Deelman Submitted to HPDC-05 at http://pegasus.isi.edu/publications.htm