Scientific Workflow Systems

Post on 28-Apr-2015

62 views 6 download

description

An introduction about Scientific Workflows, and Workflow Management Systems.

Transcript of Scientific Workflow Systems

Workflow Systems for Science: Concepts & Tools

Domenico TaliaICAR-CNR and University of Calabria, 87036 Rende, Italy

by: Seyed Ziae Mousavi MojabWayne State University - 2013

● Introduction

● Main programming issues in the area of scientific workflow

● Significant WMS

● Issues that are still open in the area of scientific workflows

● Conclusion

Agenda

"workflows provide a declarative way of specifying the high-level logic of an application, hiding the low-level details that

are not fundamental for application design."

Introduction

"A workflow is a well-defined, and possibly repeatable, pattern or systematic organization of activities designed to

achieve a certain transformation of data."

Introduction

● Scientific Workflow:

Introduction

Introduction

Taverna, Pegasus, Triana, Askalon, Kepler, GWES, and Karajan, ...

● Scientific Workflows & open research issues

Workflow Programming

● Textual programming interface

● Visual programming interface

Workflow Programming

- Directed Acyclic Graph (DAG)

- Directed Cyclic Graph (DCG)

● Programming Structure:

Workflow Programming

- Efficiency: bind tasks to appropriate computing resource

- Robustness: detecting and recovering from failure-- monitoring-- checkpoints-- step by step execution

● Workflow Enactment:

Workflow Programming

- Abstract Level: what has to be done at each task along with information about how tasks are interconnected

- Concrete Level: the implementation and/or resources to be used

● Workflow Design:

Scientific Workflow Management Systems

Workflow Management Systems: are software environments providing tools to define, compose, map, and execute workflows

- Programming languages: BPEL, UML, Petri nets, XML-based, …

Scientific Workflow Management Systems

- Script-like Systems: Grid Ant, Karajan,...

- Graphical-based Systems

● Workflow Design:

Scientific Workflow Management Systems

- Java based, Open source, developed at the University of Manchester

- To support life sciences (design and execution of scientific workflows)

- Can invoke any web service through WSDL (code reusability)

- Can also invoke local java services, api … and import data from CSV or Excel Spreadsheet

● Taverna:

Scientific Workflow Management Systems

i. Taverna Engineii. Taverna Workbenchiii. Taverna Serveriv. A command line tool

● Taverna Tools:

Scientific Workflow Management Systems

Scientific Workflow Management Systems

i. Pipeliningii. Implicit iteration of service callsiii. Conditional calling of servicesiv. Customizable looping over a servicev. Failover and retry of service callingvi. Parallel executionvii. Managing previous runs and workflow results

● Features of Taverna Workflow:

Scientific Workflow Management Systems

- Java based, Open source, developed at Cardiff University- Modularized architecture- Combines a visual interface + data analysis tools- Can connect heterogeneous tools (Web services, Java units,...)- Uses its own custom workflow language (+BPEL)- Uses several workflow patterns including loop and branches

● Triana:

Scientific Workflow Management Systems

- Signal analysis- Image manipulation- Desktop publishing- Also to integrate your own tools

● Triana Tools:

Scientific Workflow Management Systems

Scientific Workflow Management Systems

- Developed at the university of Southern California- Runs on desktops, clusters, grids, clouds- Used in several scientific areas including bioinformatics, astronomy, earthquake science, gravitational wave physics, and ocean science.- Executes the workflow tasks in the order of their dependencies- Includes a sophisticated error recovery system

● Pegasus:

Scientific Workflow Management Systems

i. The Mapper:- builds an executable workflow based on an abstract

workflow provided by the user- can also restructure the workflow for optimization purpose

ii. The Execution Engine:- executes the tasks in appropriate order

iii. The Task Manager:- managing and supervising workflow tasks on the local or

remote resources

● Pegasus Components:

Scientific Workflow Management Systems

Scientific Workflow Management Systems

- Java based, open source, developed at the University of California- Can execute workflows from graphical interface or command line- Based on the concept of directors- Runs on local and Grids- Supports foreign language interface through JNI (Matlab actor, Python actor...)- Supports distributed computational resources through Web and Grid service actor- Used to design and execute various workflows in biology, ecology, geology, chemistry, and astrophysics

● Kepler:

Scientific Workflow Management Systems

Scientific Workflow Management Systems

- Developed at the University of Innsbruck- Allows the execution of distributed workflow applications in service oriented Grids- Uses Globus Toolkit as Grid Middleware- Uses a custom XML based language (AGWL)

● Askalon:

Scientific Workflow Management Systems

i. Resource Brokerii. Resource Monitoringiii. Information Serviceiv. Workflow Executorv. Metascheulervi. Performance Predictionvii. Performance Analysis

● Askalon Architecture:

Scientific Workflow Management Systems

Scientific Workflow Management Systems

- Java based, open source data mining systems- Offers easy GUI interface- Includes Knowledge Flow tool- Data mining algorithms are wrapped as web services- Executes a whole workflow only on a single computer- Can use Gridlab to exploit Grid resources- Provides data & task parallelism

● Weka4WS:

Scientific Workflow Management Systems

Scientific Workflow Management Systems

- Multi level abstraction- Plugin concept (DB2 activity, Grid activity, ...)- Can be used on clusters, grids, clouds- Uses GworkflowDL based on Petri nets- Supports exception handling

● GWES (Generic Workflow Execution Service):

Scientific Workflow Management Systems

Scientific Workflow Management Systems

- Utilizes reference nets for composing workflow tasks in hierarchical way- Has forwarder-receiver components- Maps between tasks and resources

● DVega:

Scientific Workflow Management Systems

- Java based- Allows users to compose workflows through XML scripting language & K- Supports linear and parallel execution- Supports hierarchical workflow- Allows monitoring of the execution (checkpointing subsystem)- Workflows can be modified during the runtime

● Karajan:

Scientific Workflow Management Systems

Scientific Workflow Management Systems

- allows user to compose distributed data mining workflow - execute workflows onto the Knowledge Grid- visualize the result

Functionalities:i. Metadata managementii. Design and execution management

● DIS3GNO:

Scientific Workflow Management Systems

Scientific Workflow Management Systems

Discussion and Research issues

- Abstractions for data representation

- Abstractions for concurrent processing orchestration

- Annotating, storing and retrieving workflow results

● Workflow formalisms:

Discussion and Research issues

i. Textual or graphical composition

ii. Mapping of the abstract workflow description onto the available resources

iii. Scheduling, monitoring, and debugging of subsequent execution

● Workflow Lifecycle:

Discussion and Research issues

i. Adaptive Workflow Execution Modelsii. High level tools and languages for workflow compositioniii. Scientific workflow Interoperability and Opennessiv. Big Data management and knowledge discovery workflowsv. Internet-wide distributed workflow executionvi. Service-oriented workflows on Cloud infrastructuresvii. Workflows composition and execution in Exascale computing systemsviii. Fault-tolerance and recovery strategies for scientific workflowsxi. Workflow provenance and annotation mechanisms and systems

● Topics to investigate:

Conclusion

- Support scientific processes

- Integrate programs, methods, agents and services

- Helps knowledge discovery from Big Data

- Needs to deal with failures

● Workflow Systems:

Thank You!