A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support Manuel Caeiro Zsolt...

29
A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support Manuel Caeiro Zsolt Nemeth Thierry Priol CoreGRID Post Doc IRISA, Rennes, France MTA SZTAKI, Budapest, Hungary Associated Teacher University of Vigo, Spain [email protected] MTA SZTAKI Budapest, Hungary IRISA Rennes, France

Transcript of A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support Manuel Caeiro Zsolt...

A Chemical Workflow Engine for Scientific Workflows with Dynamicity

Support

Manuel Caeiro Zsolt Nemeth Thierry Priol

CoreGRID Post DocIRISA, Rennes, France

MTA SZTAKI, Budapest, Hungary

Associated TeacherUniversity of Vigo, Spain

[email protected] MTA SZTAKIBudapest, Hungary

IRISARennes, France

Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 2

Outline of the Presentation

1. Introduction• Scientific Workflows• The Chemical Computation Model

2. Proposal• The Scientific Workflow Language• The Chemical Workflow Engine• Dynamicity Support

3. Validation

4. Conclusions and Future Works

3

1. Introduction

This work has been performed in the context of the CoreGRID Excellence Network• IRISA (Rennes): December 2007 – March 2008• SZTAKI (Budapest): April 2008 – August 2008

VIGO

RENNESBUDAPEST

Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support

Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 4

1. Introduction: Scientific Workflows

Scientific applications and experiments involve:• Large number of operations• Large data sets• Complex algorithms

Earth Sciences

Biology

Medical Image Analysis

Astronomy

Wheather Prediction

Sub-atomic Physics

Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 5

1. Introduction: Scientific Workflows

Dynamicity is intrinsic to Scientific Workflows

• Scientists usually introduce modifications and variations in their experiments

• Scientific workflows are not always completely specified• Data is known dynamically during execution• Data is distributed and mobile• The resources are not fixed, but they change during workflow

execution

Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 6

1. Introduction: Scientific Workflows

Dynamicity Requirements (1/2)– Monitoring

• To observe the progress of the workflow• To obtain the partial and final results

– Automatic Control• To support the detection of errors, problems• To support the control of data values and events

– Reproducibility• To enable the reproduction of the execution• It is important to validate the results

– Smart “re-runs”• To be able to re-start at an already performed stage

– Version Management• To support and distinguish different “attempts”

Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 7

1. Introduction: Scientific Workflows

Dynamicity Requirements (2/2)– User steering

• VCR-like: pause, play, roll-back, etc.• Checkpoints

– User Manipulation• To be able to change the abstract workflow descriptions• To be able to change the data and the parameters

– Adaptation in the Workflow Language• Controlled change of workflows• Parametric studies

– Adaptation in the Workflow Management System• Support execution with different resources• Support changes in task assignment to resources and

services’ instances

Use

r D

rive

nA

uton

omou

s

Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 8

1. Introduction: The Chemical Computation Model

Main Idea: Computation as chemical reactions

Programs are conceived as chemical solutionsinvolving a set of molecules of different types

that react among them in accordance with specific reaction conditions and actions

Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 9

1. Introduction: The Chemical Computation Model

Molecule types:– Variables (data)– Reaction conditions and Actions (instructions)– Molecule Aggregations (pairs) – Solutions

• A solution is a container of molecules where chemical computations can be produced

Computation:– A molecule with a reaction condition “matches” another

molecule (or set of molecules) that satisfies its condition– The molecules react and the actions are performed

– The matched molecules are consumed – New molecules are created

– Return to step 1 until the solution is inert

Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 10

1. Introduction: The Chemical Computation Model

An example: Compute the maximum value of a set of numbers– Chemical solution:

• Numbers: 1, 2, 7, 8, 9• Reaction condition and action:

Match x, y; if x>y then replace x, y by x

1Passive MoleculeNumbers

Chemical Solution

Active MoleculeReaction condition and action

2

8

97

Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 11

1. Introduction: The Chemical Computation Model

Main properties of the chemical computation model:

• Inherently concurrent

• Natural parallelism. No serialization is imposed

• Non determinism

Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 12

2. Proposal

Goal: • To develop a workflow engine for scientific applications based on the

chemical computation model and supporting dynamicity

Steps:• The Scientific Workflow Language• The Chemical Workflow Engine• The Support of Dynamicity

Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 13

2. Proposal: The Scientific Workflow Language

No General Accepted Scientific Workflow Language: • There exists several languages• Two main approaches: control-flow and data-flow • Specific data operators:

o SCUFL: one-to-one, all-to-allo ASKALON: large data set loops

• Solution Adopted:• To propose a new workflow language involving the more common

constructs

Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 14

2. Proposal: The Scientific Workflow Language

Main Features: • It is an extension to Event-driven Process Chains (EPCs)• Events represent the state• Data Elements are related to Events (Inputs and outputs of Functions)• Resources are used to process Functions• Connector Types: AND/OR/XOR-split/Join, Sub-process, Loops, Data-

Loops, O2O, A2A

Function Connector Event Data Element Resource

Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 15

2. Proposal: The Scientific Workflow LanguageLAPW0

Data-LOOP-split

Init

R1

Event1

LAPW1-K1

Event21

Event31

LAPW1-K2

Event22

Event32

LAPW1-Kn

Event2n

Event3n

Data-LOOP-join

R2

Data1

Data21

Data31

An Example: The VIEM workflow from

ASKALOM

2. Proposal: The Chemical Workflow EngineTwo main kinds of molecules:

Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 16

Function Connector Event Data Element Resource

Active Molecules Passive Molecules

Connector + Event(s) + Data Element(s) Event(s) + Data Element(s)

Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 17

2. Proposal: The Chemical Workflow Engine

Functions evolve through 4 states:• Disabled: a function not activated, not matched the input Event• Enabled: not matched the input Data Elements• Ready: not assigned to appropriate Resources• Initiated: the function that is being performed

Each state is represented by a different molecule

Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 18

2. Proposal: The Chemical Workflow Engine

Disabled FunctionsDisabled Connectors

Events Data Elements

Enabled Function

Ready Function

Resources

Initiated Function

Event

Data Element

Resource

Chemical Solution

Disabled Enabled Ready Initiated

Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 19

2. Proposal: The Chemical Workflow Engine

Connectors evolve through 2 states:• Disabled: a connector not activated, not matched the input Event(s)• Enabled: not matched the input Data Elements

Each state is represented by a different molecule

Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 20

3. An HOCL Workflow Engine

Disabled FunctionsDisabled Connectors

Events Data Elements

+ 1 Connector

Resources

F.A

Ev.1 D.A.1..n

Resource

Chemical Solution

Data One-to-One Connector

F.A

+

F.B

Data A.1,2, …, N

Data B.1,2, …, N

Data C.1,2, …, N

Ev.1 Ev.2

Ev.3.1… 3.N

F.B

Ev.2 D.B.1..n

Resource

+ Connector+ 2 Connector

+ N Connector

Data A.1

Data B.1

Data C.1

Ev.3.1

F.C

Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 21

2. Proposal: The Chemical Workflow Engine

Structure of the Chemical Workflow Engine:• Separated in 4 sub-solutions: one for each state• Transfer of molecules among sub-solutions

Operations in the Workflow Engine:• Compilation: the molecules representing the Disabled Functions and

Connectors corresponding to the process definition are introduced• Data Population: the molecules representing the Input Data Elements related

with a case are introduced• Resource Population: the molecules representing the available Resources are

introduced• Instance Creation: the molecules representing the initial Events are introduced

Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 22

2. Proposal: The Chemical Workflow Engine

InputData

Compilation Data Population

Instance Creation

Resource Population

Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 23

2. Proposal: The Chemical Workflow Engine

Identifiers:• Element Identifier: distinguishes among the several elements included

in a process specification.• Process Schema Identifier: distinguishes among process

specifications. • It has two parts: a process number and a version number.• Included in Functions, Connectors and Events.

• Instance Identifier: distinguishes among the several instances.• It includes a thread identifier (numbered Data Elements).• Included in Events and Data Elements and also in Functions and

Connectors in states Enabled, Ready and Initiated.

Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 24

2. Proposal: Dynamicity Support

Dynamicity is supported in several ways:

• A workflow specification can be modified by changing the Functions and Connectors contained in the disabled sub-solution.

• The distinction between Event and Data Element molecules enables to separate the workflow specification from the data to be processed.

• Several workflow instances can be initiated and executed in parallel. Disabled molecules are not eliminated.

• The availability of Event molecules enables to develop a steering facility.

• Data Element molecules are not eliminated. This enables the development of monitoring, “smart re-runs” and provenance solutions.

Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 25

2. Proposal: Dynamicity Support

Addendums to the Identifiers:• Addendum to the Process Schema Identifier

• Enables to use modifying versions of an existing process specification just by including the new molecules.

• Addendum to the Instance Identifier• Enables to use the data of another instance execution.

We support the 13 change patterns proposed in [18]:

Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 26

3. Validation

Developed in CLIPS:• CLIPS provides an environment for the construction of rule-based

expert systems• CLIPS programming is performed by assertions and rules

• Assertions are used to are used to maintain information• Rules specify a certain action to be performed when a

conditions is satisfied• To validate the CWE we used two kinds of assertions and specific

rules:• Active molecule assertions of two types (Function and

Connector) and four possible states (Disabled, Enabled, Ready, Initiated)

• Passive molecule assertions of three types (Event, Data Element and Resource)

Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 27

4. Conclusions

Summary:• Scientific workflows are gaining a great momentum• Dynamicity is an intrinsic need in scientific workflows

• A workflow engine based on the Chemical Computation Model has been conceived supporting dynamicity needs

Scientific Workflow Chemical Workflow Engine CLIPS

Future Work:• To provide an actual validation

Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 28

4. Conclusions

Opportunities from the Chemical Computation Model:

• It is parallel in nature: it facilitates the distribution of computations parallelization is obtained in a transparent way• Workflows can be specified in the same way• Execution of workflows is automatically parallelized

• Change of the role of resources:– Central “chemical solution” vs. central Workflow engine– Pull-oriented vs. Push-oriented

Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 29

Questions and Comments are welcome!!!

[email protected]