The design and implementation of a workflow analysis tool Vasa Curcin Department of Computing...
-
Upload
jocelin-johns -
Category
Documents
-
view
217 -
download
0
Transcript of The design and implementation of a workflow analysis tool Vasa Curcin Department of Computing...
The design and implementation of a workflow analysis tool
Vasa CurcinDepartment of Computing
Imperial College London
Scientific workflow field
• Scientific workflows: a high-level programming language with explicit graphical representation of flow of data and/or control
• Research into automation of processes supporting scientific research
• Significant role in providing middleware for UK eScience programme: Taverna, Discovery Net, Triana
• Lingua franca of service-oriented computing
Deluge of workflows
Meandre
Taverna Discovery Net
Triana
Kepler
KNIME
Orange
Pentaho
Pegasus
TridentYAWL
BPEL
LONI
GenePatterns
Galaxy
VisTrails
UGENE
Wildfire
Bioinformatics
Cheminformatics Environmental Science
Business Intelligence
Astronomy
Sensor informatics
…
Workflow analysis• There is a need for formal models to capitalize on the
benefits of this infrastructureo Work evaluated on Discovery Net workflowo Concepts applicable to other workflow systems
• Some aimso Minimise cost of data movement and processingo Provide technology for workflow clients and warehouses
(indexing, guided construction…)• Tasks
o Safenesso Instance boundso Static workflow optimization o Establishing polymorphic type profiles of workflows
Underlying models• Control flow model
o Process calculus definitionso Communication along named channels
• Fixed for atomic execution, dynamic for streamingo New instance of the process launched as soon as the node receives a
tokeno Computational tree logic modelling execution states
• Data flow modelo Nodes associated with lambda calculus formulas and term graphso Polymorphic type transformationso Rewrite rules defined for sets of nodes as term graph
transformations• Embedding
o Way of combining the control and data semantics
Workflow analysis tool• Similarity checker
o Bisimilarity of processes• Process profiler
o Deadlock/livelock detectiono Reachabilityo Task bounds
• Composability checkero Design-time testso Type requirementso Polymorphic properties
• Equivalence checkero Functional equivalence
• Optimizero Rewrite rules for
transformations
Similarity checker
• Based purely on the pi-calculus process modelo Workflows translated into the process modelo Parallel composition of independent node processes with named channelso Compared in terms of:
• Internal executions (node actions)• Set of observable outputs - define only relevant outputs
• Model checker used to test different types of bisimilarityo Node executions conveniently represented as silent actionso Strong bisimulation becomes strict one-to-one workflow action mappingo Weak bisimulation ignores internal actions and communications and
focuses on visible outputs
Workflow Process model
Model checker
Similarity checker: example
• ABC (Another Bisimilarity Checker) used• Model checker used to test different types of bisimilarity
o Node executions conveniently represented as silent actionso Strong bisimulation becomes strict one-to-one workflow action
mappingo Weak bisimulation ignores internal actions and communications and
focuses on visible outputs
Process profiling
• The process algebra representation translated into a Kripke frameo Enumerated states denoting the number of instances of each workflow
nodeo Transitions of the frame are the node executionso Use CTL formulas to queryo NuSMV model checker employed
• Allows questions such as:o Reachability of a particular stateo Detection of deadlocks and livelockso Safety - some state always executingo Bounds on a number of instances of a node
Workflow Process model
Kripke frame
Process profiling: example
• Reachabilityo EF Fτ
1 – Is there an execution that achieves one instance of Fo AF Fτ
1 – Do all executions always achieve one instance of F
• Livelockso AG (Cτ
-> AG AF Cτ) – Is there always a livelock with Co EF (Cτ
-> AG AF Cτ) – Can there be a livelock with C
• Instance boundso maxX .EF Aτ
x – What is the maximum number of instances of A
Composability checker
• Polymorphic type formulas for the workflow components/fragments
• When composing:o The output and input of each fragment compared in terms of free and
bound type variableso If no clashes, free variables resolved to form the type formula of the
compositiono Inference engine developed specifically for the tool
• Determines:o If a workflow fragment can be reused on a new inputo Find compatible services in the warehouse
Workflow Data model
Type formulas
Composability checker: example
• Fragment of three nodes LMNo Input q, with required attributes A, B, Do Two outputs u, vo A present in both. B in u. D in neither.
• Two outputs can be joined with O
Equivalence tester / optimizer
• Uses a set of node equivalence ruleso Defined for each workflow system or node subseto Algorithm applies allowed transformations to reduce
two workflows to the same expression• Combined with rewrite heuristics
o Node-specific againo Simple example: relational model again
Workflow Data model
Node equivalences
Equivalence tester/optimizer: example
• Relational workflow searching for Adverse Drug Reactions in GPRD database• Rewrite rules
o Set of relational equivalences• Heuristics
o Early projections/selectionso Late joinso Easy scenario – brute force algorithm works
Related and future work• Data typing
o COMAD for Kepler• Workflow process analysis
o GWorkflowDLo YAWL
• New workflow tools with relational structureso KNIMEo Orangeo Pentaho
• Extensions:o Streaming – blocking and batchingo Improved state reduction algorithms for CTL modelo Adding more type constructs for polymorphism
Summary
• Workflow analysis needed to improve takeup and exploitation of workflowso Enterprise environmentso Profile resource usage, risk of failure, execution timeo Support reuse and repurposing
• Separation of control and data aspects allows use of existing model checkers and familiar techniqueso Process algebras, temporal logics, type polymorphisms,
term graphs• Current version works on Discovery Net/InforSense
o KNIME, Pentaho very similar – only require extra parserso Full streaming process model for Taverna in the works