Dynamic DAGMan with ClassAds

24
Condor Team Member Computer Sciences Department University of Wisconsin-Madison [email protected] http://www.cs.wisc.edu/condor Dynamic DAGMan with ClassAds Himani Apte

description

Himani Apte. Dynamic DAGMan with ClassAds. Outline. DAGMan workflow management Motivation for dynamic DAGMan ClassAds Putting together: DAGMan + ClassAds Looking ahead. DAGMan. Directed Acyclic Graph Manager Meta-scheduler for Condor DAG: set of jobs with dependencies - PowerPoint PPT Presentation

Transcript of Dynamic DAGMan with ClassAds

Page 1: Dynamic DAGMan with ClassAds

Condor Team MemberComputer Sciences DepartmentUniversity of Wisconsin-Madison

[email protected]://www.cs.wisc.edu/condor

Dynamic DAGMan with ClassAds

Himani Apte

Page 2: Dynamic DAGMan with ClassAds

www.cs.wisc.edu/condor

Outline

› DAGMan workflow management

› Motivation for dynamic DAGMan

› ClassAds

› Putting together: DAGMan + ClassAds

› Looking ahead

Page 3: Dynamic DAGMan with ClassAds

www.cs.wisc.edu/condor

DAGMan

› Directed Acyclic Graph Manager

› Meta-scheduler for Condor

› DAG: set of jobs with dependencies

› Manages submission of DAG jobs

› Enforces execution order

› DAGMan itself is a Condor job!

Page 4: Dynamic DAGMan with ClassAds

www.cs.wisc.edu/condor

Example DAGJob A A.condor

Job B B.condor

Job C C.condor

Job D D.condor

Parent A Child B C

Parent B C Child D

Script PRE A input.sh

Script POST D output.sh

A

CB

D

Page 5: Dynamic DAGMan with ClassAds

www.cs.wisc.edu/condor

Simplified state diagram of a DAG node

Waiting Pre-running Submitted Done

Post-running

Failed

Page 6: Dynamic DAGMan with ClassAds

www.cs.wisc.edu/condor

DAGMan: important properties

› Monitors job state using Condor logs

› Simple and clean recovery model• Rescue DAG: saves state at failure• Restart: reconstruct internal state

› Scripts allow “lazy” planning

› Throttling parameters

Page 7: Dynamic DAGMan with ClassAds

www.cs.wisc.edu/condor

Outline

› DAGMan workflow management

› Motivation for dynamic DAGMan

› ClassAds

› Putting together: DAGMan + ClassAds

› Looking ahead

Page 8: Dynamic DAGMan with ClassAds

www.cs.wisc.edu/condor

Motivation for dynamic DAGMan

› DAG: complete execution order

› Flexibility to make run-time decisions• Which subset of DAG nodes should execute?• When should node X execute?

› Conditional DAGs• Associate a condition with DAG edges• Simplest condition: successful completion of

parent nodes

Page 9: Dynamic DAGMan with ClassAds

www.cs.wisc.edu/condor

Conditional DAG: examples

A

Condition:

A.x = = true

B C

Yes No

P1 P2

C

Condition:

P1.x OR P2.x

Example 1 Example 2

Page 10: Dynamic DAGMan with ClassAds

www.cs.wisc.edu/condor

Motivation for dynamic DAGMan

› Scripts can be leveraged for lazy planning• For simple conditions

• E.g. exit value of job

• Modify DAG structure• E.g. convert branch-not-taken to no-op/empty

› We want a generic solution

› Supported by “Dynamic DAGMan”

Page 11: Dynamic DAGMan with ClassAds

www.cs.wisc.edu/condor

Outline

› DAGMan workflow management

› Motivation for dynamic DAGMan

› ClassAds

› Putting together: DAGMan + ClassAds

› Looking ahead

Page 12: Dynamic DAGMan with ClassAds

www.cs.wisc.edu/condor

ClassAds

› Classified advertisements

› Used extensively in Condor• Define jobs, machines, resources• Define conditions, triggers,

requirements• Maintain internal state

Page 13: Dynamic DAGMan with ClassAds

www.cs.wisc.edu/condor

ClassAds

› List of attribute-value pairs• Simple value types: integer, strings• Complex types: list, expressions,

ClassAds

› Matchmaking framework• Tests match between two classAds• Using “Requirements” expression

› Great fit for Dynamic DAGMan

Page 14: Dynamic DAGMan with ClassAds

www.cs.wisc.edu/condor

Outline

› DAGMan workflow management

› Motivation for dynamic DAGMan

› ClassAds

› Putting together: DAGMan + ClassAds

› Looking ahead

Page 15: Dynamic DAGMan with ClassAds

www.cs.wisc.edu/condor

Putting together: DAGMan + ClassAds

› Dynamic DAGMan research project• Work-in-progress• Not yet available in Condor

› DAG nodes have associated classAds› Basic node attributes

• Job identifier, name, type• Status (Waiting, Submitted, Done, etc.)

Page 16: Dynamic DAGMan with ClassAds

www.cs.wisc.edu/condor

Dynamic DAGMan: attributes

› Execution characteristics of job• Exit value• Wall-clock time • CPU utilization (local and remote)• Network statistics (bytes sent / received)• Information about files transferred (for vanilla

universe)

› Attributes maintained by Condor for a job

Page 17: Dynamic DAGMan with ClassAds

www.cs.wisc.edu/condor

Dynamic DAGMan: conditions

› Requirements expression• Defines trigger condition for the node• Arbitrarily complex expression • Defined on the attributes of parent

nodes

› Use matchmaking to determine if a node can be submitted

Page 18: Dynamic DAGMan with ClassAds

www.cs.wisc.edu/condor

Dynamic DAG: example

A

condition x = = true

B C

Yes No

Job A A.condor

Job B B.condor

Job C C.condor

Parent A Child B \

COND [ ( other.job == A &&

other.x == true ) ]

Parent A Child C \

COND [ ( other.job == A &&

other.x == false ) ]

Page 19: Dynamic DAGMan with ClassAds

www.cs.wisc.edu/condor

Dynamic DAGMan: example

Job P1 P1.condor

Job P2 P2.condor

Job C C.condor

Parent P1 P2 Child C \

COND [ (other.job == P1 &&

other.x == true) ||

(other.job == P2 &&

other.x == true) ]

P1 P2

C

Condition:

P1.x OR P2.x

Page 20: Dynamic DAGMan with ClassAds

www.cs.wisc.edu/condor

Dynamic DAGMan

› Recovery model is still the same• Rescue DAG: saves node state at failure• ClassAd attribute-values can be re-

generated from Condor logs

› Flexibility to make run-time decisions• Which subset of nodes in the DAG

should be executed?• When should node X be executed?

Page 21: Dynamic DAGMan with ClassAds

www.cs.wisc.edu/condor

Outline

› DAGMan workflow management

› Motivation for dynamic DAGMan

› ClassAds

› Putting together: DAGMan + ClassAds

› Looking ahead

Page 22: Dynamic DAGMan with ClassAds

www.cs.wisc.edu/condor

Looking ahead

› DAG with only implicit edges• Parent-child relations embedded in classAds• Nodes specify

• Trigger condition• Preference for child nodes to run

• On-the-fly dependency formation based on previous node execution

› DAGMan collaborates with Quill• Getting attributes from persistent storage

Page 23: Dynamic DAGMan with ClassAds

www.cs.wisc.edu/condor

Looking ahead

› Allow job to modify/add its attributes• Determine what happens after job exits

› Global state control• Throttling expression/parameters

› Global DAG-classAd• Statistics on running, successful and failed

jobs• E.g. if (#failed jobs > N ) run cleanup node

Page 24: Dynamic DAGMan with ClassAds

www.cs.wisc.edu/condor

Thank-you

We are interested in knowing your suggestions!