Condor Week 2005Optimizing Workflows on the Grid1 Optimizing workflow execution on the Grid Gaurang...

20
Condor Week 20 05 Optimizing Workflows on the Grid 1 Optimizing workflow execution on the Grid Gaurang Mehta - [email protected] Based on “Optimizing Grid-Based Workflow Execution” Gurmeet Singh, Carl Kesselman, Ewa Deelman Submitted to HPDC-05

Transcript of Condor Week 2005Optimizing Workflows on the Grid1 Optimizing workflow execution on the Grid Gaurang...

Page 1: Condor Week 2005Optimizing Workflows on the Grid1 Optimizing workflow execution on the Grid Gaurang Mehta - gmehta@isi.edugmehta@isi.edu Based on “Optimizing.

Condor Week 2005 Optimizing Workflows on the Grid 1

Optimizing workflow execution on the Grid

Gaurang Mehta - [email protected]

Based on “Optimizing Grid-Based Workflow Execution”

Gurmeet Singh, Carl Kesselman, Ewa Deelman

Submitted to HPDC-05

Page 2: Condor Week 2005Optimizing Workflows on the Grid1 Optimizing workflow execution on the Grid Gaurang Mehta - gmehta@isi.edugmehta@isi.edu Based on “Optimizing.

Condor Week 2005 Optimizing Workflows on the Grid 2

Introduction

Use of workflows on grid is becoming widespread in scientific applications.- Astrophysics- High Energy Physics- Biology etc.

Current focus is on- GUIs for composing workflows- Standardizing workflow specification languages- Mapping of tasks in the workflow for optimizing system

metric- Use of some workflow execution engine to execute the

workflow (DAGMan, GRMS, Triana, Webflow etc) Performance of the workflow execution engine has not

received much attention

Page 3: Condor Week 2005Optimizing Workflows on the Grid1 Optimizing workflow execution on the Grid Gaurang Mehta - gmehta@isi.edugmehta@isi.edu Based on “Optimizing.

Condor Week 2005 Optimizing Workflows on the Grid 3

Workflow Model

Parse the workflow description

Create a ready list of executable tasks

Start the tasks on the resources

Identify resources for the tasks

Select tasks from the ready list

Monitor task completion

Dependency analysis

Update the ready list

Page 4: Condor Week 2005Optimizing Workflows on the Grid1 Optimizing workflow execution on the Grid Gaurang Mehta - gmehta@isi.edugmehta@isi.edu Based on “Optimizing.

Condor Week 2005 Optimizing Workflows on the Grid 4

Workflow Model

The costs of workflow execution are in- Creating and maintaining a ready list

- Resource matching

- Dispatching jobs to resources

These costs can become significant for a fine granularity workflow (the runtimes of jobs are small) due to - Large number of jobs in workflow

- Dependencies between jobs

- Distributed nature of resources

Page 5: Condor Week 2005Optimizing Workflows on the Grid1 Optimizing workflow execution on the Grid Gaurang Mehta - gmehta@isi.edugmehta@isi.edu Based on “Optimizing.

Condor Week 2005 Optimizing Workflows on the Grid 5

Condor as the Workflow Execution Engine

We use Condor as the Workflow Execution Engine. Condor-Glidein is used for provisioning the execution

resources ahead of time. - Resource provisioning allows for experiments to isolate

and examine the workflow execution overheads Based on the workflow execution costs described

earlier, the factors that affect the performance in the context of the Condor system are the following- Scheduling interval (schedd, negotiator)

- Job Dispatch Rate (schedd)

- Job Submission rate (DAGMan, schedd)

Page 6: Condor Week 2005Optimizing Workflows on the Grid1 Optimizing workflow execution on the Grid Gaurang Mehta - gmehta@isi.edugmehta@isi.edu Based on “Optimizing.

Condor Week 2005 Optimizing Workflows on the Grid 6

Montage Workflow Structure

4500 total jobs

890 jobs top level

2600 jobs second level

10 minutes

100 processors

100% efficiency

Page 7: Condor Week 2005Optimizing Workflows on the Grid1 Optimizing workflow execution on the Grid Gaurang Mehta - gmehta@isi.edugmehta@isi.edu Based on “Optimizing.

Condor Week 2005 Optimizing Workflows on the Grid 7

Execution Environment

100 Worker Nodes from NCSA Teragrid cluster

Submit Host

Condor Pool

COLLECTOR

NEGOTIATOR

DAGMan

SCHEDD

Central Manager

STARTD

Page 8: Condor Week 2005Optimizing Workflows on the Grid1 Optimizing workflow execution on the Grid Gaurang Mehta - gmehta@isi.edugmehta@isi.edu Based on “Optimizing.

Condor Week 2005 Optimizing Workflows on the Grid 8

Baseline Condor Performance

Page 9: Condor Week 2005Optimizing Workflows on the Grid1 Optimizing workflow execution on the Grid Gaurang Mehta - gmehta@isi.edugmehta@isi.edu Based on “Optimizing.

Condor Week 2005 Optimizing Workflows on the Grid 9

Scheduling Interval

Negotiation cycle is the process of identifying resources for jobs.

Interval between two successive negotiation cycles is the scheduling interval

Can be controlled in variety of ways- Fixed Scheduling Interval

- Starting negotiation cycle at submission of each job at a rate no greater than 20 seconds

Page 10: Condor Week 2005Optimizing Workflows on the Grid1 Optimizing workflow execution on the Grid Gaurang Mehta - gmehta@isi.edugmehta@isi.edu Based on “Optimizing.

Condor Week 2005 Optimizing Workflows on the Grid 10

Scheduling at Job Submission

30 seconds 5 minutes 10 minutes

Page 11: Condor Week 2005Optimizing Workflows on the Grid1 Optimizing workflow execution on the Grid Gaurang Mehta - gmehta@isi.edugmehta@isi.edu Based on “Optimizing.

Condor Week 2005 Optimizing Workflows on the Grid 11

Fixed Scheduling Interval

30 seconds 5 minutes 10 minutes

Page 12: Condor Week 2005Optimizing Workflows on the Grid1 Optimizing workflow execution on the Grid Gaurang Mehta - gmehta@isi.edugmehta@isi.edu Based on “Optimizing.

Condor Week 2005 Optimizing Workflows on the Grid 12

Effect of Scheduling interval

Page 13: Condor Week 2005Optimizing Workflows on the Grid1 Optimizing workflow execution on the Grid Gaurang Mehta - gmehta@isi.edugmehta@isi.edu Based on “Optimizing.

Condor Week 2005 Optimizing Workflows on the Grid 13

Job Dispatch Rate

Dispatch rate is the rate at which the scheduler can start the jobs on the remote resource

Throttled using the JOB_START_DELAY Default setting of 2 seconds prevents loads on

the submit machine and on the scheduler Artificial delay can be expensive if workflow

contains too many small jobs.

Page 14: Condor Week 2005Optimizing Workflows on the Grid1 Optimizing workflow execution on the Grid Gaurang Mehta - gmehta@isi.edugmehta@isi.edu Based on “Optimizing.

Condor Week 2005 Optimizing Workflows on the Grid 14

Job Dispatch Rate

JSD 0 seconds 1 second 2 second

Page 15: Condor Week 2005Optimizing Workflows on the Grid1 Optimizing workflow execution on the Grid Gaurang Mehta - gmehta@isi.edugmehta@isi.edu Based on “Optimizing.

Condor Week 2005 Optimizing Workflows on the Grid 15

Job submission rate

Rate at which DAGMan submits jobs to the Condor queue.

With a faster dispatch rate, the job submission rate becomes the limiting factor.

Submission rate depends on the dependencies in a workflow.

Restructuring a workflow to reduce dependencies can increase submission rate.

Page 16: Condor Week 2005Optimizing Workflows on the Grid1 Optimizing workflow execution on the Grid Gaurang Mehta - gmehta@isi.edugmehta@isi.edu Based on “Optimizing.

Condor Week 2005 Optimizing Workflows on the Grid 16

Workflow Restructuring

Page 17: Condor Week 2005Optimizing Workflows on the Grid1 Optimizing workflow execution on the Grid Gaurang Mehta - gmehta@isi.edugmehta@isi.edu Based on “Optimizing.

Condor Week 2005 Optimizing Workflows on the Grid 17

DAGMan for each composite job

1 Cluster per level 2 Clusters per level

Page 18: Condor Week 2005Optimizing Workflows on the Grid1 Optimizing workflow execution on the Grid Gaurang Mehta - gmehta@isi.edugmehta@isi.edu Based on “Optimizing.

Condor Week 2005 Optimizing Workflows on the Grid 18

Condor cluster for each composite job

Page 19: Condor Week 2005Optimizing Workflows on the Grid1 Optimizing workflow execution on the Grid Gaurang Mehta - gmehta@isi.edugmehta@isi.edu Based on “Optimizing.

Condor Week 2005 Optimizing Workflows on the Grid 19

Conclusion

Condor is a high throughput system and the default configuration works well for long running jobs.

We are interested in high performance using Condor for fine granularity workflows.

It is possible to improve the performance by modifying the configuration parameters and using Condor features like clustering.

90% reduction in the workflow completion time for the Montage fine granularity workflow.

The reduction possible depends on the workflow structure, granularity and number of available resources

Page 20: Condor Week 2005Optimizing Workflows on the Grid1 Optimizing workflow execution on the Grid Gaurang Mehta - gmehta@isi.edugmehta@isi.edu Based on “Optimizing.

Condor Week 2005 Optimizing Workflows on the Grid 20

Future Work

Investigate the tradeoff between the resource requirements and the workflow completion time.

Investigate the effect of granularity on the workflow performance.

Read “Optimizing Grid-Based Workflow Execution” by Gurmeet Singh, Carl Kesselman, Ewa Deelman Submitted to HPDC-05 at http://pegasus.isi.edu/publications.htm