Untitled-17 [cobweb.cs.uga.edu]cobweb.cs.uga.edu/~kochut/teaching/8350/Papers/...4 ISRNSo...

16
Hindawi Publishing Corporation ISRN Soware Engineering Volume 2013, Article ID 404525, 15 pages http://dx.doi.org/10.1155/2013/404525 Review Article �or�o�tem� �or �cience�once�t� and Tool� Domenico Talia ICAR-CNR and University of Calabria, 87036 Rende, Italy Correspondence should be addressed to Domenico Talia; [email protected] Received 3 December 2012; Accepted 23 December 2012 Academic Editors: J. Cao, B. C. Lai, and K. ramboulidis Copyright © 2013 Domenico Talia. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. e wide availability of high-performance computing systems, Grids and Clouds, allowed scientists and engineers to implement more and more complex applications to access and process large data repositories and run scienti�c experiments in silico on distributed computing platforms. Most of these applications are designed as work�ows that include data analysis, scienti�c computation methods, and complex simulation techniques. Scienti�c applications require tools and high-level mechanisms for designing and executing complex work�ows. �or this reason, in the past years, many efforts have been devoted towards the development of distributed work�ow management systems for scienti�c applications. is paper discusses basic concepts of scienti�c work�ows and presents work�ow system tools and frameworks used today for the implementation of application in science and engineering on high-performance computers and distributed systems. In particular, the paper reports on a selection of work�ow systems largely used for solving scienti�c problems and discusses some open issues and research challenges in the area. 1. Introduction �ork�ows have emerged as an effective paradigm to address the complexity of scienti�c and business applications. e wide availability of high-performance computing systems, Grids and Clouds, allowed scientists and engineers to imple- ment more and more complex applications to access and process large data repositories and run scienti�c experiments in silico on distributed computing platforms. Most of these applications are designed as work�ows that include data analysis, scienti�c computation methods, and complex sim- ulation techniques. e design and execution of many scienti�c applica- tions require tools and high-level mechanisms. Simple and complex work�ows are oen used to reach this goal. �or this reason, in the past years, many efforts have been devoted towards the development of distributed work�ow management systems for scienti�c applications. �ork�ows provide a declarative way of specifying the high-level logic of an application, hiding the low-level details that are not fundamental for application design. ey are also able to integrate existing soware routines, datasets, and services in complex compositions that implement scienti�c discovery processes. In the most general terms, according to the �ork�ow Management Coalition [1–3], a work�ow is �the automation of a business process, in whole or part, during which docu- ments, information or tasks are passed from one participant to another for action, according to a set of procedural rules.” e same de�nition can be used for scienti�c processes composed of several processing steps that are connected together to express data and/or control dependencies [4, 5]. e term process indicates a set of tasks linked together with the goal of creating a product, calculating a result, or providing a service. Hence, each task (or activity) represents a piece of work that forms one logical step of the overall process [6, 7]. A work�ow is a well-de�ned, and possibly repeatable, pattern or systematic organization of activities designed to achieve a certain transformation of data. �ork�ows, as practiced in scienti�c computing, derive from several signi�cant precedent programming models that are worth noting because these have greatly in�uenced the way we think about work�ows in scienti�c applications. �e can call

Transcript of Untitled-17 [cobweb.cs.uga.edu]cobweb.cs.uga.edu/~kochut/teaching/8350/Papers/...4 ISRNSo...

Page 1: Untitled-17 [cobweb.cs.uga.edu]cobweb.cs.uga.edu/~kochut/teaching/8350/Papers/...4 ISRNSo wareEngineering duringthework owexecutionthattypicallyisdoneon large-scaledistributedorparallelinfrastructuressuchasHPC

Hindawi Publishing CorporationISRN Soware EngineeringVolume 2013, Article ID 404525, 15 pageshttp://dx.doi.org/10.1155/2013/404525

Review Article�or��o� ���tem� �or �cience� �once�t� and Tool�

Domenico Talia

ICAR-CNR and University of Calabria, 87036 Rende, Italy

Correspondence should be addressed to Domenico Talia; [email protected]

Received 3 December 2012; Accepted 23 December 2012

Academic Editors: J. Cao, B. C. Lai, and K. ramboulidis

Copyright © 2013 Domenico Talia. is is an open access article distributed under the Creative Commons Attribution License,which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

e wide availability of high-performance computing systems, Grids and Clouds, allowed scientists and engineers to implementmore and more complex applications to access and process large data repositories and run scienti�c experiments in silico ondistributed computing platforms. Most of these applications are designed as work�ows that include data analysis, scienti�ccomputation methods, and complex simulation techniques. Scienti�c applications require tools and high-level mechanisms fordesigning and executing complex work�ows. �or this reason, in the past years, many efforts have been devoted towards thedevelopment of distributed work�ow management systems for scienti�c applications. is paper discusses basic concepts ofscienti�c work�ows and presents work�ow system tools and frameworks used today for the implementation of application inscience and engineering on high-performance computers and distributed systems. In particular, the paper reports on a selectionof work�ow systems largely used for solving scienti�c problems and discusses some open issues and research challenges in thearea.

1. Introduction

�ork�ows have emerged as an effective paradigm to addressthe complexity of scienti�c and business applications. ewide availability of high-performance computing systems,Grids and Clouds, allowed scientists and engineers to imple-ment more and more complex applications to access andprocess large data repositories and run scienti�c experimentsin silico on distributed computing platforms. Most of theseapplications are designed as work�ows that include dataanalysis, scienti�c computation methods, and complex sim-ulation techniques.

e design and execution of many scienti�c applica-tions require tools and high-level mechanisms. Simple andcomplex work�ows are oen used to reach this goal. �orthis reason, in the past years, many efforts have beendevoted towards the development of distributed work�owmanagement systems for scienti�c applications. �ork�owsprovide a declarative way of specifying the high-level logicof an application, hiding the low-level details that are notfundamental for application design. ey are also able tointegrate existing soware routines, datasets, and services in

complex compositions that implement scienti�c discoveryprocesses.

In the most general terms, according to the �ork�owManagement Coalition [1–3], a work�ow is �the automationof a business process, in whole or part, during which docu-ments, information or tasks are passed from one participantto another for action, according to a set of procedural rules.”e same de�nition can be used for scienti�c processescomposed of several processing steps that are connectedtogether to express data and/or control dependencies [4, 5].e term process indicates a set of tasks linked togetherwith the goal of creating a product, calculating a result, orproviding a service. Hence, each task (or activity) represents apiece of work that forms one logical step of the overall process[6, 7].

A work�ow is a well-de�ned, and possibly repeatable,pattern or systematic organization of activities designedto achieve a certain transformation of data. �ork�ows,as practiced in scienti�c computing, derive from severalsigni�cant precedent programming models that are worthnoting because these have greatly in�uenced the way wethink about work�ows in scienti�c applications. �e can call

Page 2: Untitled-17 [cobweb.cs.uga.edu]cobweb.cs.uga.edu/~kochut/teaching/8350/Papers/...4 ISRNSo wareEngineering duringthework owexecutionthattypicallyisdoneon large-scaledistributedorparallelinfrastructuressuchasHPC

2 ISRN Soware Engineering

these the data�ow model in which data is streamed fromone actor to another. While the pure data�ow concept isextremely elegant, it is very hard to put it in practice becausedistributing control in a parallel or distributed system cancreate applications that are not very fault tolerant.

Consequently,manywork�ow systems that use a data�owmodel for expressing the computation may have an implicitcentralized control program that sequences and scheduleseach step. An important bene�t of work�ows is that, oncede�ned, they can be stored and retrieved for modi�cationsand�or reexecution: this allows users to de�ne typical patternsand reuse them in different scenarios [8]. Figure 1 showsan example of work�ow of a scienti�c experiment of theNeptune projects led by University of Washington, whichincludes control nodes (processes) and data nodes (datavalue). is experiment involves collection of data by oceanbuoys (containing a temperature sensor and an ocean currentsensor), which is then processed by a scienti�c work�ow.escienti�c work�ow is composed of four steps to process thedata from the sensors and create visualization charts as output[9].

e de�nition, creation, and execution of work�owsare supported by a so-called work�ow management system(WMS). A key function of a WMS during the work�owexecution (or enactment) is coordinating the operations ofthe individual activities that constitute the work�ow.

rough the integrated use of computer science methodsand technologies and scienti�c discovery processes, sciencewent to a new era where scienti�c science methods changedsigni�cantly by the use of computational methods and newdata analysis strategies that created the so-called e-scienceparadigm, as discussed by Bell et al. [10]. For example,the Pan-STARRS [11] astronomical survey uses MicrosoTrident Scienti�c Work�ow Workbench work�ows to loadand validate telescope detections running at about 30 TBper year. Similarly, the USC Epigenome Center is currentlyusing the Pegasus work�ow system to exploit the IlluminaGenetic Analyzer (GA) system to generate high-throughputDNA sequence data (up to 8 billion nucleotides per week)to map the epigenetic state of human cells on a genomewidescale. In this scenario, scienti�c work�ows demonstrate theireffectiveness as an effective paradigm for programming athigh-level complex scienti�c applications that in general runon supercomputers or on distributed computing infrastruc-tures such as Grids, peer-to-peer systems, and Clouds [12–14].

In this paper we discuss basic concepts of scienti�cwork�ows and present a selection of the most used scienti�cwork�ow systems such as Taverna, Pegasus, Triana, Askalon,Kepler, GWES, and Karajan, which provide innovative toolsand frameworks for the implementation of application inscience and engineering. ese work�ow systems run onparallel computers and distributed computing systems to gethigh-performance and large data access that are needed in thesolution of complex scienti�c problems [15, 16]. Each oneof those work�ow management systems running in paralleland distributed computing environments has interestingfeatures and represents a valid example of how work�ow-based programming can be used in developing scienti�c

Contained inContained in

Ocean buoy

Data collection process

Temperature sensor Ocean current sensor

Scientific workflow

Hyper cube schema generator activity

3. NC file name

7. Variables

A process

A data value

Hyper cube generator activity

Invert dataCollapse X

Hyper cube to data table

Archive chart resultstitle

Chart data table Visualization chart

F 1: e scienti�c work�ow de�ned in the Neptune project.

applications.We also discuss some open research issues in thearea of scienti�cwork�ows that are investigated today and areworthy of further studies by the computer science communityfor providing novel solutions in the area.

e rest of the paper is organized as follows. Section 2introduces the main programming issues in the area of scien-ti�c work�ows. Section 3 presents a set of signi�cant work-�ow management systems running on parallel�distributedcomputing systems. Section 4 discusses a set of main issuesthat are still open in the area of scienti�c work�ows and oftopics that must be investigated to �nd innovative solutions.Section 5 concludes the paper.

A main issue in work�ow management systems is theprogramming structures they provide to the developerwho needs to implement a scienti�c application. Moreover,

Page 3: Untitled-17 [cobweb.cs.uga.edu]cobweb.cs.uga.edu/~kochut/teaching/8350/Papers/...4 ISRNSo wareEngineering duringthework owexecutionthattypicallyisdoneon large-scaledistributedorparallelinfrastructuressuchasHPC

ISRN Soware Engineering 3

Input

Output

ProcProcProc

ProcProcProc

F 2: A simple DAG including data and control nodes.

whereas some systems provide a textual programming inter-face, others are based on a visual programming interface.ese two different interfaces imply different programmingapproaches.

is section discusses the current programming approa-ches used in scienti�c work�ow systems and compares them.�en a scienti�c work�ow is programmed as a graph ofseveral data and processing nodes that include prede�nedprocedures programmed in a programming language such as�ava, C��, and Perl. According to this approach, a scienti�cwork�ow is a methodology to compose prede�ned programsthat run single tasks, but that composed together represent acomplex application that generally needs large resources forrunning and may take long running time to complete. For anintroductive reference, see [17], it gives a taxonomy of mainfeatures of scienti�c work�ows. In fact, it is not rare to havescienti�c work�ows, for example, in the astronomy domainor in bioinformatics [18, 19], which take several days orweeksto complete their execution.

Work�ow tasks can be composed together following anumber of different patterns, whose variety helps designersaddressing the needs of a wide range of application scenar-ios [20]. A comprehensive collection of work�ow patterns,focusing on the description of control �ow dependenciesamong tasks, has been described in [20]. e most commonprogramming structure used in work�ow management sys-tems is the directed acyclic graph (DAG) (see Figure 2) or itsextension that includes loops, that is the directed cyclic graph(DCG).

�f the many possible ways to distinguish work�owcomputations, one is to consider a simple complexity scale.At the most basic level one can consider linear work�ows, inwhich a sequence of tasks must be performed in a speci�edlinear order. e �rst task transforms an initial data objectinto new data object that is used as input in the next data-transformation task.e execution of the entire chain of tasksmay take a fewminutes, or it may take days, depending on the

computational grain of each single task in the pipeline.Whenthe execution time is short, the most common work�owprogramming tool is a simple script written, for instance,in Python or Perl or even Matlab. e case of longerrunning work�ows oen requires more sophisticated tools orprogramming languages.

At the next level of complexity, one can consider work-�ows that can be represented by a DAG, where nodes of thegraph represent tasks to be performed and edges representdependencies between tasks. Two main types of dependen-cies can be considered: data dependencies (where the outputof a task is used as input by the next tasks) and controldependencies (where before to start one or a set of taskssome tasks must be completed). is is harder to representwith a scripting language without a substantial additionalframework behind it, but it is not at all difficult to representwith a tool like Ant. It is the foundation of the DirectedAcyclic Graph Manager (DAGMan) [21], a meta-schedulerof Condor [22], a specialized workload management systemdeveloped by the University of Wisconsin, and the executionengine of Pegasus. Applications that follow this pattern can becharacterized bywork�ows inwhich some tasks dependuponthe completion of several other tasks that may be executedconcurrently [11].

A directed acyclic graph (DAG) can be used to representa set of programs where the input, output, or executionof one or more programs is dependent on one or moreother programs. According to this model, also in DAGManprograms are nodes in the graph, the edges (arcs) identifythe program dependencies. For task execution, Condor �ndscomputers for the execution of programs, but it does notschedule programs based on dependencies [23]. DAGMansubmits tasks to Condor in an order represented by aDAG and processes the task results. An input �le de�nedprior to submission describes the work�ow as a DAG, andCondor uses a description �le for each program in theDAG. DAGMan is responsible for scheduling, recovery, andreporting for the set of programs submitted to Condor.

e next level of work�ow complexity can be charac-terized cyclic graphs, where cycles represent some form ofimplicit or explicit loop or iteration control mechanisms.In this case, the work�ow graph oen describes a networkof tasks where the nodes are either services or some formof soware component instances or represent more abstractcontrol objects. e graph edges represent messages or datastreams or pipes that exchange work or information amongservices and components.

e highest level of work�ow structure is one in whicha compact graph model is not appropriate. is is the casewhen the graph is simply too large and complex to beeffectively designed as a graph. However, some tools allowone to turn a graph into a new �rst-class component orservice, which can then be included as a node in anothergraph (a work�ow of work�ows or hierarchical work�ow).is technique allows graphs of arbitrary complexity to beconstructed. is nested work�ow approach requires the useof more sophisticated tools both at the compositing stageand at the execution stage. In particular, this approach makesharder the task-to-resource mapping and the task scheduling

Page 4: Untitled-17 [cobweb.cs.uga.edu]cobweb.cs.uga.edu/~kochut/teaching/8350/Papers/...4 ISRNSo wareEngineering duringthework owexecutionthattypicallyisdoneon large-scaledistributedorparallelinfrastructuressuchasHPC

4 ISRN Soware Engineering

during the work�ow execution that typically is done onlarge-scale distributed or parallel infrastructures such asHPCsystems or Cloud computing platforms [24].

In the case of work�ow enactment, two issues must bealso taken into account, efficiency and robustness. In termsof efficiency, the critical issue is the ability to quickly bindwork�ow tasks to the appropriate computing resources. Italso heavily depends on the mechanisms used to assign datato tasks, to move data between tasks that need them atvarious stages of the enactment. For instance, considering aservice-oriented scenario, we cannot assume that web serviceprotocols like SOAP should be used in anything other thantask/service control and simple message delivery [25–29].Complex ad/or data movement between components of thework�ow must be either via an interaction with a datamovement service, or through specialized binary-level datachannel running directly between the tasks involved.

Robustness is another issue, making the reasonableassumption that some parts of a work�ow may fail. It isessential that exception handling includes mechanisms torecover from failure as well as detecting it. Also failure issomething that can happen to a work�ow enactment engine.A related issue is themonitoring of the work�ow. In addition,in restarting a work�ow from a failure checkpoint, a usermaywish to track progress of the enactment. In some cases thework�ow is event driven and a log of the events that triggerthe work�ow processing can be analyzed to understand howthe work�ow is progressing. is is also an important aspectof debugging a work�ow. A user may wish to execute thework�ow step by step to understand potential errors in the�ow logic.

�ools and programming interfaces for scienti�c work�owcomposition are important components of work�ow systems.rough tools and interfaces, a developer can composeher/his scienti�c work�ow and express details about datasources, task dependencies, resource availability, and otherdesign or execution constraints. Most scienti�c work�owsystems offers graphical interface that offers high-level mech-anisms for composition, while a few systems exhibit traditiontext-based programming interfaces. Work�ow users exploitthe features of interfaces that scienti�c work�ows expose inorder to build their work�ow [30]. is stage corresponds tothe design stage of a work�ow. In general terms, two mainwork�ow levels can be found on this regard, though thereare other approaches that even differentiate more abstractionlevels.

(i) ��str�ct ��rk���s. At this high level of abstractiona work�ow contains just information about what hasto be done at each task along with information abouthow tasks are interconnected. ere is no notion ofhow input data is actually delivered or how tasks areimplemented.

(ii) ���crete ��rk���s. e mapping stage to a concretework�ow annotates each of the tasks with informa-tion about the implementation and/or resources tobe used. Information about method invocation andactual data exchange format is also de�ned.

In case a user is familiar with the technology and theresources available, they can even specify concrete work�owsdirectly. Once a work�ow speci�cation is produced, it is sentto the work�ow engine for the execution phase. At this stage,work�ow tasks aremapped also onto third-party, distributed,and heterogeneous resources and the scienti�c computationsare accomplished [31, 32].

As mentioned above, Work�ow Management Systems aresoware environments providing tools to de�ne, compose,map, and execute work�ows. ere are several WMSs onthe market, most of them targeted to a speci�c applicationdomain, such as in our case of scienti�cwork�ows [33–35]. Inthis sectionwe focus on some signi�cantWMSs developed bythe research community, with the goal of identifying themostimportant features and solutions that have been proposed forwork�ow management in the scienti�c domain [36–43].

Although a standard work�ow language like BusinessProcess Execution Language (BPEL) [44–47] has beende�ned, scienti�c work�ow systems oen have developedtheir own work�ow model for allowing users to representwork�ows. Other than BPEL, other formalisms like �ML,Petri nets [48, 49], and XML-based languages [50, 51] areused to express work�ows. is feature makes difficult thesharing of work�ow speci�cation and limits interoperabilityamong work�ow-based applications developed by using dif-ferentwork�owmanagement systems.Nevertheless, there aresome historical reasons for that, as many scienti�c work�owsystems and their work�ow models were developed beforeBPEL existed [52].

is section presents a set of work�ow managementsystems speci�cally designed for composing and running sci-enti�c applications. As we will discuss, such systems are veryuseful in solving complex scienti�c problems and offer realsolutions for improving the way in which scienti�c discoveryis done in all science domains, from physics, to bioinfor-matics, engineering, and molecular biology. Solutions theyadopted are useful to be taken into account in new WMSsand represent features that work�ow systems must include tosupport users in scienti�c application development.

Existing work�ow models can be grouped roughly intothe following two main classes [53].

(i) Script-like systems, wherework�owdescriptions spec-ify work�ows by means of a textual programminglanguage that can be described by a grammar in ananalogous way to traditional programming languageslike Perl, Ruby, or Java. ey oen have complexsemantics and an extensive syntax. ese types ofdescriptions declare tasks and their parameters by atextual speci�cation. �ypically data dependencies canbe established between them by annotations. eselanguages contain speci�c work�ow constructs, suchas sequence or loops, while do, or parallel constructsin order to build up a work�ow. Examples of scriptwork�ow descriptions are �ridAnt [54] and Karajan[55]. A commonly used script-based approach to

Page 5: Untitled-17 [cobweb.cs.uga.edu]cobweb.cs.uga.edu/~kochut/teaching/8350/Papers/...4 ISRNSo wareEngineering duringthework owexecutionthattypicallyisdoneon large-scaledistributedorparallelinfrastructuressuchasHPC

ISRN Soware Engineering 5

describe work�ows, mainly in the business work�owcommunity, is BPEL and its recent version for Webservices that builds on IBM’s Web Service FlowLanguage, WSFL.

(ii) Graphical-based systems, where work�ow modelsspecify the work�ow with only a few basic graphicalelements that correspond to the graph componentssuch as nodes and edges. Compared with script-based descriptions, graphical-based systems are eas-ier to use and more intuitive for the unskilled usermainly because of their graphical representation:nodes typically represent work�ow tasks whereascommunications (or data dependencies) betweendifferent tasks are represented as links going fromone node to another. Work�ow systems that supportgraph-based models oen incorporate graphical userinterfaces which allow users to model work�owsby dragging and dropping graph elements. Purelygraph-based work�ow descriptions generally utilizeDAGs. As mentioned before, directed acyclic graph-based languages offer a limited expressiveness, so thatthey cannot represent complex work�ows (e.g., loopscannot be expressed directly).

3.1. Taverna. Taverna [56, 57] is an open-source Java-basedwork�owmanagement system developed at the University ofManchester. e primary goal of Taverna is supporting thelife sciences community (biology, chemistry, and medicine)to design and execute scienti�c work�ows and support insilico experimentation, where research is performed throughcomputer simulations with models closely re�ecting the realworld. Even though most Taverna applications lie in thebioinformatics domain, it can be applied to a wide rangeof �elds since it can invoke any web service by simplyproviding the URL of its WSDL document. is feature isvery important in allowing users of Taverna to reuse code(represented as a service) that is available on the internet.erefore, the system is open to third-part legacy code byproviding interoperability with web services.

In addition to web services, Taverna supports the invoca-tion of local Java services (Beanshell scripts), local Java API(API Consumer), R scripts on an R server (Rshell scripts),and imports data from a Cvs or Excel spreadsheet.

e Taverna suite includes the following four tools.

(i) Taverna Engine, used for enacting work�ows.(ii) Taverna Workbench, a client application that enables

users to graphically create, edit, and runwork�ows ona desktop computer (see Figure 3).

(iii) Taverna Server, which enables users to set up adedicated server for executing work�ows remotely.

(iv) A Command Line Tool, for a quick execution ofwork�ows from a command prompt.

ese tools bring together a wide range of features thatmake it easier to �nd, design, execute, and share complexwork�ows. Such features include

(i) pipelining and streaming of data;

F 3: A snapshot of the Taverna Workbench.

(ii) implicit iteration of service calls;(iii) conditional calling of services;(iv) customizable looping over a service;(v) failover and retry of service calling;(vi) parallel execution and con�gurable number of con-

current threads;(vii) managing previous runs and work�ow results.

In addition, Taverna provides service discovery facilities andintegrated support for browsing curated service catalogues,such as the BioCatalogues. Finally, even though Tavernawas originally designed for accessing web services and localprocesses, it can also be used for accessing high-performancecomputing infrastructures through the use of the TavernaPBSplugin, developed at the University of Virginia, which allowsa user to de�ne work�ows running on a cluster that uses aPBS queuing system.

3.2. Triana. Triana [58–60] is a Java-based scienti�c work-�ow system, developed at the Cardiff University, whichcombines a visual interface with data analysis tools. It canconnect heterogeneous tools (e.g., web services, Java units,and J�TA services) in one work�ow. Triana uses its owncustomwork�ow language, although it can use other externalwork�ow language representations such as BPEL, which areavailable through pluggable language readers and writers.Triana comes with a wide variety of built-in tools for signal-analysis, image manipulation, desktop publishing, and soforth and has the ability for users to easily integrate their owntools.

e Triana framework is based on a modularized archi-tecture in which the GUI connects to a Triana engine, calledTriana Controlling Service (TCS), either locally or remotely.A client may log into a TCS, remotely compose and run anapplication, and then visualize the result locally. Applicationcan also be run in batch mode; in this case, a client mayperiodically log back to check the status of the application.

e Triana GUI, see Figure 4, includes a collection oftoolboxes containing a set of Triana components, and aworkspace where the user can graphically de�ne the requiredapplication behavior as a work�ow of components. Each

Page 6: Untitled-17 [cobweb.cs.uga.edu]cobweb.cs.uga.edu/~kochut/teaching/8350/Papers/...4 ISRNSo wareEngineering duringthework owexecutionthattypicallyisdoneon large-scaledistributedorparallelinfrastructuressuchasHPC

6 ISRN Soware Engineering

F 4: A snapshot of the Triana GUI used to compose a dataanalysis work�ow.

F 5: e Pegasus scienti�c work�ow main menu.

component includes information about input and outputdata type, and the system uses this information to performdesign-time type checking on requested connections inorder to ensure data compatibility between components.Triana provides several work�ow patterns, including loopsand branches. Moreover, the work�ow components are latebound to the services they represent, thus ensuring a highlydynamic behavior.

Beyond the conventional operational usage, in whichapplications are graphically composed from collections ofinteracting units, other usages in Triana include using thegeneralized writing and reading interfaces for integratingthird-party services and work�ow representations within theGUI.

3.3. Pegasus. e Pegasus system [61], developed at theUniversity of Southern California, includes a set of technolo-gies to execute work�ow-based applications in a number ofdifferent environments, including desktops, clusters, Grids,and Clouds. Pegasus has been used in several scienti�c areasincluding bioinformatics, astronomy, earthquake science,gravitational wave physics, and ocean science.

e Pegasus work�ow management system can managethe execution of an application formalized as a work�owby mapping it onto available resources and executing thework�ow tasks in the order of their dependencies (seeFigure 5). All the input data and computational resources

necessary for work�ow execution are automatically locatedby the system. Pegasus also includes a sophisticated errorrecovery system that tries to recover from failures by retryingtasks or the entire work�ow, by remapping portions of thework�ow, by providing work�ow-level checkpointing, andby using alternative data sources, when possible. Finally, inorder for a work�ow to be reproduced, the system recordsprovenance information including the locations of data usedand produced, and which soware was used with whichparameters.

e Pegasus system includes the following three maincomponents.

(i) e Mapper, which builds an executable work�owbased on an abstract work�ow provided by the useror generated by the work�ow composition system.To this end, this component �nds the appropriatesoware, data, and computational resources requiredfor work�ow execution.eMapper can also restruc-ture the work�ow in order to optimize performanceand add transformations for data management or togenerate provenance information.

(ii) e Execution Engine, which executes in appropriateorder the tasks de�ned in the work�ow. is com-ponent relies on the compute, storage, and networkresources de�ned in the executable work�ow to per-form the necessary activities.

(iii) e Task Manager, which is in charge of managingsingle work�ow tasks, by supervising their executionon local or remote resources.

Recent research activities carried out on Pegasus investigatedthe system implementation on Cloud platforms and how tomanage computational work�ows in the Cloud for develop-ing scalable scienti�c applications [62–65].

3.4. Kepler. Kepler [66] is a Java-based open source sowareframework providing a graphical user interface and a run-time engine that can executework�ows either fromwithin thegraphical interface or from a command line. It is developedand maintained by a team consisting of several key institu-tions at the University of California and has been used todesign and execute various work�ows in biology, ecology,geology, chemistry, and astrophysics.

Kepler is based on the concept of directors, which dictatethe models of execution used within a work�ow. Singlework�ow steps are implemented as reusable actors that canrepresent data sources, sinks, data transformers, analyticalsteps, or arbitrary computational steps (see Figure 6). Eachactor can have one or more input and output ports, throughwhich streams of data tokens �ow, and may have parametersto de�ne speci�c behavior. �nce de�ned, Kepler work�owscan be exchanged using an XML-based formalism.

By default, Kepler actors run as local Java threads. How-ever, the system also allows spawning distributed executionthreads through Web and Grid services. Moreover, Keplersupports foreign language interfaces via the Java Native Inter-face (JNI), which gives the user �exibility to reuse existinganalysis components and to target appropriate computational

Page 7: Untitled-17 [cobweb.cs.uga.edu]cobweb.cs.uga.edu/~kochut/teaching/8350/Papers/...4 ISRNSo wareEngineering duringthework owexecutionthattypicallyisdoneon large-scaledistributedorparallelinfrastructuressuchasHPC

ISRN Soware Engineering 7

F 6: Example of a scienti�c work�ow in Kepler [67].

tools. For example, Kepler includes a Matlab actor and aPython actor.

e Web and Grid service actors allow users to utilizedistributed computational resources in a single work�ow.e web service actor provides the user with an interface toeasily plug in and execute any WSDL-de�ned web service.Kepler also includes a web service harvester for plugging ina whole set of services found on a web page or in a UDDIrepository. A suite of data transformation actors (XSLT,XQuery, Perl, etc.) allows to link semantically compatible butsyntactically incompatible web services together.

In addition to standard web services, Kepler also includesspecialized actors for executing jobs on a Grid. ey includeactors for certi�cate-based authentication (ProxyInit), sub-mitting jobs to a Grid (GlobusJob), as well as for Grid-baseddata transfer (GridFTP). Finally, Kepler includes actors fordatabase access and querying: DBConnect, which emits adatabase connection token, to be used by any downstreamDBQuery actor that needs it.

3.5. Askalon. Askalon is an application development andruntime environment, developed at the University of Inns-bruck, which allows the execution of distributed work�owapplications in service-oriented Grids [68]. Its SOA-basedruntime environment uses Globus Toolkit as Grid middle-ware.

Work�ow applications in Askalon are described at a highlevel of abstraction using a custom XML-based languagecalled Abstract Grid Work�ow Language (AGWL) [69].AGWL allows users to concentrate on modeling scienti�capplications without dealing with the complexity of the Gridmiddleware or any speci�c implementation technology suchas Web and Grid services, Java classes, or soware compo-nents. Activities in AGWL can be connected using a richset of control constructs, including sequences, conditionalbranches, loops, and parallel sections.

e Askalon architecture (Figure 7) includes a wide set ofservices.

(i) Resource broker, which provides for negotiation andreservation of resources as required to execute a Gridapplication.

(ii) Resource monitoring, which supports the monitor-ing of Grid resources by integrating existing Grid

resource monitoring tools with new techniques (e.g.,rule-based monitoring).

(iii) Information service, a general-purpose service forthe discovery, organization, and maintenance ofresource- and application-speci�c data.

(iv) Work�o� e�ecutor, which supports dynamic deploy-ment, coordinated activation, and fault-tolerant com-pletion of activities onto remote Grid nodes.

(v) Metascheduler, which performs amapping of individ-ual or multiple work�ow applications onto the Grid.

(vi) Performance prediction, a service for the estimationof the execution time of atomic activities and datatransfers, as well as of Grid resource availability.

(vii) Performance analysis, a service that uni�es the perfor-mance monitoring, instrumentation, and analysis forGrid applications and supports the interpretation ofperformance bottlenecks.

3.6. Weka4WS. Work�ows are widely used in data miningsystems to manage data and execution �ows associated tocomplex applications [70]. Weka, one of the most used open-source data mining systems, includes the KnowledgeFlowtool that provides a drag-and-drop interface to compose andexecute data mining work�ows. Unfortunately, the WekaKnowledgeFlow allows users to execute a whole work�owonly on a single computer. On the other hand, most datamining work�ows include several independent branches thatcould be run in parallel on a set of distributed machines toreduce the overall execution time.eGrid Lab of Universityof Calabria implemented distributed work�ow compositionand execution in Weka4WS, a framework that extends Wekaand its KnowledgeFlow environment to exploit distributedresources available in a Grid using web service technologies[71].

As shown in Figure 8, a work�ow in Weka4WS is builtthrough a GUI that offers an easy interface to select data,tasks, and links to connect them [72].us, a work�ow can becomposed by selecting components from a tool bar, placingthem on a layout canvas and connecting them together: eachcomponent of the work�ow is demanded to carry out aspeci�c step of the data mining process. A user operates onthe KnowledgeFlow GUI to build a data mining work�ow,typically composed by multiple data mining tasks: the localtasks are executed by invoking the local Weka library, whilethe remote ones are performed through a client modulewhich acts as intermediary between the GUI and the remoteweb services. Each task is carried out in a thread of its ownthus allowing running multiple tasks in parallel.

Weka4WS uses a service-oriented approach in whichall the Weka data mining algorithms are wrapped as webservices and deployed onGrid nodes. Users can compose andinvoke those services in a transparent way by de�ning datamining work�ows as in the original Weka KnowledgeFlow.is approach allows de�ning task-parallel, distributed datamining applications in an easy and effective way [73].

Task parallelism is a form of parallelism that runs mul-tiple independent tasks in parallel on different processors,

Page 8: Untitled-17 [cobweb.cs.uga.edu]cobweb.cs.uga.edu/~kochut/teaching/8350/Papers/...4 ISRNSo wareEngineering duringthework owexecutionthattypicallyisdoneon large-scaledistributedorparallelinfrastructuressuchasHPC

8 ISRN Soware Engineering

(UML) Application composition (AGWL)

Executioncontrol

MetaschedulingPerformance Performance

prediction analysis

Resource Resourcebroker monitoring

monitoring

Informationservice

Web services

Grid infrastructure

Jobsubmission

Filetransfer

DiscoverySecurity

F 7: e general architecture of the Askalon system.

F 8: A work�ow implementing distributed data miningapplication in Weka4WS.

available on a single parallel machine or on a set of machinesconnected through a network like the Grid. Another form ofparallelism that can be effectively exploited in data miningwork�ows is data parallelism, a large data set is split intosmaller chunks, each chunk is processed in parallel, and theresults of each processing are then combined to produce asingle result. Both parallelism forms aim to achieve executiontime speedup, and a better utilization of the computingresources, but while task parallelism focuses on runningmul-tiple tasks in parallel so that the execution time correspondsto the slowest task, data parallelism focuses on reducing theexecution time of a single task by splitting it into subtasks,each one operating on a subset of the original data.

e data-parallel approach is widely employed in dis-tributed datamining as it allows to process very large datasetsthat could not be analyzed on a single machine due to

memory limitations and/or computing time constraints. Anexample of data-parallel application is distributed classi�-cation: the dataset is partitioned into different subsets thatare analyzed in parallel by multiple instances of a givenclassi�cation algorithm� the resulting �base classi�ers� arethen used to obtain a global model through various selectionand combining techniques. Weka4WS through the Knowl-edgeFlow programming interface supports implementationof work�ows that use data parallelism and task parallelism.

3.7. GWES. e General Work�ow Execution Service(GWES) [74� (formerly the GridWork�ow ExecutionService) is a work�ow system that implements a multilevelabstraction and semantic-based solution to facilitate the de-coupling between tasks and resources or services. With theimplementation of a plugin concept for arbitrary work�owactivities, it has now a broader area of application, notlimited to the orchestration of Grid and web services. eGWES coordinates the composition and execution processof work�ows in arbitrary distributed systems, such as S�A,Cluster, Grid, or Cloud environments.

e GWES processes work�ows that are described usingthe Grid Work�ow �escription Language (GWork�ow�L),which is based on the formalism of high-level Petri nets(HLPNs) [75�. In the work�ow speci�cations, transitionsrepresenting tasks and tokens in the nets represent data�owing through the work�ow.Hierarchical Petri nets are alsoexploited to model hierarchical work�ow speci�cations andto support the multilevel abstraction. An abstract task on topof the hierarchy can be mapped dynamically at runtime byre�ning the work�ow structure.

Figure 9 shows how GWES operates in the automaticmapping of dynamic work�ows to determine the appropriateand available resources. �ser requests are �rst mapped to

Page 9: Untitled-17 [cobweb.cs.uga.edu]cobweb.cs.uga.edu/~kochut/teaching/8350/Papers/...4 ISRNSo wareEngineering duringthework owexecutionthattypicallyisdoneon large-scaledistributedorparallelinfrastructuressuchasHPC

ISRN Soware Engineering 9

“I wantthis data”

User request

Abstract workflow

Service candidates

Service instances

Grid resources

User request

F 9: Abstract layers in the GWES work�ow re�nementprocess.

abstract work�ows (yellow). Each activity will be initiallyassigned with the help of the resource matcher servicecandidates (blue) that provide the appropriate functionality.A scheduler selects one of the candidates (green) and carriesout the activity to the appropriate resources. In accordancewith the �etri net re�nement paradigm, places and transitionswithin the net can be re�ned with additional �etri nets,thereby facilitating the modeling of large real-world systems.

�owever, the Gwork�owDL does not support the inher-ent modeling of the dynamic re�nement process itself and,consequently, the work�ow structure is modi�ed dynami-cally by the work�ow engine.

is requires to the user a deep knowledge of the work-�ow engine functionality. For instance, there is no simpleconstruct in the GWork�owDL. Additionally, GWES alsosupports exception handling in the hierarchical scienti�cwork�ows thereby the work�ow engine can modify partof the work�ow structure upon a failure, providing greatlevels of dynamism and �exibility. Nevertheless, GWES doesnot support a clear separation of exception handling fromthe application data and control �ow, apart from simplerescheduling techniques and checkpoint/restart functionali-ties.

3.8. DVega. DVega [76, 77] is a scienti�c work�ow enginethat adapts itself to the changing availability of resources,minimizing the human intervention.DVega utilizes referencenets [78], a speci�c class of �etri nets, for composingwork�ow tasks in a hierarchical way and the Linda [79]communication paradigm for isolating work�ow tasks fromresources.

Work�ow tasks in DVega interact with resourcesby exchanging messages: in particular work�ow taskssend/receive messages to/from Linda and the existingforwarder-receiver components in DVega are responsiblefor taking the messages from Linda and sending them tothe resources and vice versa. DVega’s architecture, shown

Workflow engine

Renew’s composition tool

Workflow space Conversation space

Renew’s reference net interpreter

RLinda

RenewWf

repositoryCheckpointing

database

Forwarders/receivers

Middleware

Resources

F 10: Components of the DVega architecture.

in Figure 10, is completely built upon service-orientedprinciples and by means of a tuple space shields work�owsfrom the heterogeneity of the middleware.

One of the aspects of the architecture that is worthhighlighting is the mapping between tasks and resources.A work�ow task receives its inputs, generates a tuple withthem, andwrites it into the tuple-space (Linda). Aer that, thework�ow task is suspended until the expected result is back:once the tuple is in Linda, the corresponding proxy takes thetuple, transforms it into the suitable format, and forwards it tothe destination resource.When the result arrives to the proxy,it is transformed into a message and written into the tuplespace. en, the work�ow tasks withdraw the expected tuplecontaining the result. e main advantage of this approachis that a work�ow task will have only to indicate the type ofinteraction required, without explicitly sending a message toa speci�c proxy. In consequence, proxies can be added andmodi�ed at runtime, without having to stop the execution,and can be shared by work�ow tasks.

3.9. Karajan. e Java CoG Kit Karajan system allows usersto compose work�ows through an �ML scripting language aswell as with an equivalent more user-friendly language calledK [54, 55]. �oth languages support hierarchical work�owdescriptions based on DAGs and have the ability to usesequential and parallel control structures such as if, while,and parallel in order to easily express concurrency. Moreover,the Karajan work�ow framework can support hierarchicalwork�ows based on DAGs including the sequential controlstructures and parallel constructs, and it can make useof underlying Grid tools such as Globus GRAM [80] fordistributed/parallel execution of work�ows.

e architecture of the Java CoG Kit Karajan framework,shown in Figure 11, contains the work�ow engine thatinteracts with high-level components, namely, a visualizationcomponent that provides a visual representation of thework�ow structure and allows monitoring of the execution,

Page 10: Untitled-17 [cobweb.cs.uga.edu]cobweb.cs.uga.edu/~kochut/teaching/8350/Papers/...4 ISRNSo wareEngineering duringthework owexecutionthattypicallyisdoneon large-scaledistributedorparallelinfrastructuressuchasHPC

10 ISRN Soware Engineering

a checkpointing subsystem that allows the checkpointing ofthe current state of the work�ow, and a work�ow servicethat allows the execution of work�ows on behalf of a user. Anumber of convenience libraries enables the work�ow engineto access speci�c functionalities.

Work�ow speci�cations can be visualized in the system.e analysis of dynamism support in Karajan reveals thatwork�ows can actually be modi�ed during runtime throughtwo mechanisms. e �rst one is through the de�nition ofelements that can be stored in a work�ow repository thatgets called during runtime. e second one is through thespeci�cation of schedulers that support the dynamic associa-tion of resources to tasks.e execution of the work�ows caneither be conducted through the instantiation of a work�owon the user client or can be executed on behalf of the useron a service. Besides, the execution engine of the system alsofeatures work�ow checkpointing and rollback.

3.10. DIS3GNO. An important bene�t of work�ows is that,once de�ned, they can be stored and retrieved for modi�ca-tions and/or reexecution. is feature allows users to de�netypical data mining patterns and reuse them in differentcontexts. We worked in this direction by de�ning a service-oriented work�ow formalism and a visual soware environ-ment, named DIS3GNO, to design and execute distributeddata mining tasks over the Knowledge Grid [81], a service-oriented framework for distributed data mining on the Grid.

In DIS3GNO a work�ow is represented as a directedacyclic graph whose nodes represent resources and whoseedges represent the dependencies among the resources [82,83]. In supporting user to develop applications, DIS3GNO isthe user interface for the following twomainKnowledgeGridfunctionalities.

(i) Metadata management: DIS3GNO provides an inter-face to publish and search metadata about data andtools.

(ii) Design and Execution management: DIS3GNO pro-vides an environment to program and execute dis-tributed data mining applications as service-orientedwork�ows, through the interactionwith the executionmanagement service of the Knowledge Grid.

eDIS3GNOGUI, depicted in Figure 12, has been designedto re�ect this twofold functionality. In particular, it providesa panel (on the le) devoted to search resource metadata anda panel (on the right) to compose and execute data miningwork�ows. In the top-le corner of the window there is amenu used for opening, saving, and creating new work�ows,viewing and modifying some program settings, and viewingthe previously computed results present in the local �lesystem. Under the menu bar there is a toolbar containingsome buttons for the execution control (starting/stopping theexecution and resetting the nodes statuses) and other for thework�ow editing (creation of nodes representing datasets,tools or viewers, creation of edges, selection of multiplenodes, and deletion of nodes or edges).

Starting from the data mining work�ow designed bya user, DIS3GNO generates an XML representation of

the corresponding data mining application referred to asconceptual model. DIS3GNO passes the conceptual modelto a given execution plan manager, which is in charge oftransforming it into an abstract execution plan for subsequentprocessing by the resource allocation and execution service.is service receives the abstract execution plan and createsa concrete execution plan. To accomplish this task, it needsto evaluate and resolve a set of resources and services andchoose those matching the requirements speci�ed by theabstract execution plan. As soon as the resource allocationand execution service has built the concrete execution plan,it is in charge of coordinating its execution by invokingthe coordinated execution of services corresponding to thenodes of the concrete execution plan. e status of thecomputation is noti�ed to the execution planmanager, whichin turn forwards the noti�cations to theDIS3GNO system forvisualization.

In this way, DIS3GNO operates as an intermediarybetween the user and the Knowledge Grid, a service-oriented system for high-performance distributed KDD. Allthe Knowledge Grid services for metadata and executionmanagement are accessed transparently by DIS3GNO, thusallowing the domain experts to compose and run complexdata intensive work�ows without worrying about the under-lying infrastructure details.

4. Discussion and Research Issues

As we discussed through the previous sections, work�owsenable scientists to develop complex simulations and to runfaceted applications that are composed of a collection ofsoware components, data sources, web services, and legacycode. In the same cases, such components are designed,developed, and run in collaborative environments.

On large-scale computing infrastructures used todayfor running large scienti�c applications, work�ow systemsprovide an abstract representation of concurrent tasks and adistributed virtualmachine to execute applications composedof many activities.

To address the big challenges in science and engineer-ing, work�ow programming formalisms including adequateabstractions for data representation and concurrent process-ing orchestration are needed. Besides the amount of dataaccessed and analyzed by the work�ow tasks, the outputdata produced by each work�ow node or stage needs to bestored and annotated with provenance and other metadatato interpret them in the future or reuse them in furtherexecutions or new work�ows [84–87].

From the WMSs presented in the previous section, wehave seen that typically work�ow design and execution stepsin a distributed scenario are complex and involve multiplestages that are part of the work�ow lifecycle. ey include

(i) textual or graphical composition,(ii) mapping of the abstract work�ow description onto

the available resources,(iii) scheduling [88], monitoring, and debugging of the

subsequent execution.

Page 11: Untitled-17 [cobweb.cs.uga.edu]cobweb.cs.uga.edu/~kochut/teaching/8350/Papers/...4 ISRNSo wareEngineering duringthework owexecutionthattypicallyisdoneon large-scaledistributedorparallelinfrastructuressuchasHPC

ISRN Soware Engineering 11

Viewer/monitor Checkpointing subsystem Service

Karajan workflow engine

XML

HTML

syntax

syntax

Workflowlanguage

specification

KCore Task Java Forms

library library library library library

Java swingJava CoG kitabstractions

Java libraries

F 11: e soware architecture of the �ara�an work�ow management system.

Data-driven applications are more and more developed inscience to exploit the large amount of digital data today avail-able [89, 90]. Adequate work�ow composition mechanismsthen are needed to support the complex work�ow man-agement process that includes work�ow creation, work�owreuse, and modi�cations made to the work�ow over time.

To address the main research issues and advance thecurrent work�ow management systems used for developingscienti�c applications, several issues are still open and severaltopics must be investigated to �nd innovative solutions [91].Here we give a partial list that could be used as the basis ofa research agenda towards the design of the next-generationscienti�c work�ow systems. For each item of this list weintroduce the basic aspects and sketch the road towardsfurther investigations.

(i) �daptive Work�ow ��ecution Models. As the computingplatforms become more and more dynamic and are basedon virtualization, WMSs need to embody techniques fordynamically adapting the execution of work�ows on suchelastic infrastructures to bene�c from their dynamic nature.

(ii) �ig�-�evel �ools and �anguages �or Work�ow �omposi-tion. Whereas in the area of programming languages high-level constructs have been designed and implemented, oenthe basic building blocks of work�ows are simple and regular.is scenario demands for further investigation towardshigher and complex abstract structures to be included inwork�ow programming tools.

(iii) Scienti�c Work�ow �nteropera�ility and �penness. Inter-operability is a main issue in large-scale experiments wheremany resources, data, and computing platforms are used.WMSs oen are based on proprietary data and ad hocformats. For this reason the combined use of different

systems is inhibited; therefore standard formats and modelsare needed to support interoperability and ease cooperationamong research teams using different tools.

(iv) Big Data Management and Knowledge Discovery Work-�ows. Some systems discussed in Section 3 are mainlydevoted to support the implementation of work�ows for dataanalysis and knowledge discovery. e large availability ofmassive and distributed data sources requires the develop-ment of tools and applications for data analysis and to extractuseful knowledge hidden in it. In this area, current work[81, 92] shows that work�ows are effective, but more researchactivities are needed to develop higher-level tools and scalablesystems.

(v) �nternet-Wide Distri�uted Work�ow ��ecution. TypicallyWMSs run on parallel computers and on single site dis-tributed systems. Recently, by the use of Grids and P2Psystems, multisite work�ows have been developed and havebeen executed on a geographical scale.is approachmust beextended by at the Internet scale to solve bigger problems andinvolve several research teams and labs.

(vi) Service-�riented Work�ows on �loud �n�rastructures. Asdiscussed in [92], the service-oriented paradigm will allowlarge-scale distributed work�ows to be run on heteroge-neous platforms and the integration of work�ow elementsdeveloped by using different programming languages. Weband Cloud services are a paradigm that can help to handlealso work�ow interoperability, so this research issue needs adeeper investigation.

(vii)Work�ows �omposition and ��ecution in ��ascale �om-puting Systems. Exascale computing systems in the nextyears will become the most interesting scalable platforms

Page 12: Untitled-17 [cobweb.cs.uga.edu]cobweb.cs.uga.edu/~kochut/teaching/8350/Papers/...4 ISRNSo wareEngineering duringthework owexecutionthattypicallyisdoneon large-scaledistributedorparallelinfrastructuressuchasHPC

12 ISRN Soware Engineering

F 12: A screenshot of the DIS3GNO GUI composing a dataanalysis work�ow.

where to run a huge amount of concurrent tasks; thereforemodels, techniques, and mechanisms for efficient executionof work�ows on this massive parallel systems are vital.

(viii) �ault-Tolerance and �eco�ery Strategies for Scienti�cWork�ows. Only a few of the scienti�c WMSs used todayprovide explicit mechanisms to handle with hardware and/orsoware failures. However, fault tolerance is important whenlarge-scale or long-lived work�ow-based applications arerun on distributed platforms. To address those issues bothimplicit and explicit fault tolerance and recoverymechanismsmust be embodied in future WMSs to reduce the impact offaults and to avoid reexecution of not completed work�owsdue to faults.

(xi) Work�ow Pro�enance and Annotation Mechanisms andSystems. Data provenance in work�ows is captured as a setof dependencies between data elements [93, 94]. It maybe used for interpreting data and providing reproducibleresults, but also for troubleshooting and optimizing work�owefficiency. Research issues to be investigating in this area arerelated to efficient techniques to manage provenance data, tovisualize and mine those data and improve provenance datainteroperability.

5. Conclusions

Work�ow systems support research and scienti�c processesby providing a paradigm that may encompass all the steps ofdiscovery based on the execution of complex algorithms andthe access and analysis of scienti�c data. For instance, in data-driven discovery processes, knowledge discovery tasks canproduce results that can con�rm real experiments or provideinsights that cannot be achieved in laboratories.

Computational science today requires high-performanceinfrastructures that can be exploited by running on themcomplexwork�ows that integrate programs,methods, agents,and services coming from different organizations or sites. Insuch a way, algorithms, data, services, and other sowarecomponents are orchestrated in a single virtual frameworkthat speci�es the execution sequence and the more appropri-ate scheduling of this collection of resources [95, 96].

Today the availability of huge amounts of digital data,oen referred to as Big Data, originated a sort of data-centric science that is based on the intelligent analysis of largedata repositories to extract the rules hidden in the raw datacoming for laboratory experiments of physical phenomena[92, 97]. Work�ows may help researchers in these discoverytasks by coding in a computational graph an entire scienti�cpractice or methodology that is too complex or impossible toimplement in a laboratory.

Moreover, many scienti�c work�ows have to operateover heterogeneous infrastructure with possibility of failures,therefore dealing with such failures in a more coherent way,so that a similar set of techniques can be applied acrosswork�ow engines, is another important challenge to beaddressed and solved [98–102].

To summarize the paper contribution, basic concepts ofscienti�c work�ows and work�ow system tools and frame-works used today for the implementation of application inscience and engineering have been introduced and discussed.Finally, the paper reported on a selection of work�ow systemslargely used for solving scienti�c problems and discussedopen issues and research challenges in the area.

References

[1] Work�ow Management Coalition, Terminology and Glossary,Document Number WFMC- TC-1011, Issue 3.0, 1999.

[2] D. Hollingsworth, Work�ow Management Coalition Speci�ca-tion��eWork�ow �eference Model, Document Number TC00-1003, v. 1.1, 1995.

[3] S. Jablonski and C. Bussler, Work�ow Management� ModelingConcepts, Architecture and Implementation, omson Interna-tional Computer Press, 1996.

[4] P. Grefen and R. N. Remmerts De Vries, “A reference architec-ture for work�ow management systems,” Data and KnowledgeEngineering, vol. 27, no. 1, pp. 31–57, 1998.

[5] L. Liu, C. Pu, and D. D. Ruiz, “A systematic approach to �ex-ible speci�cation, composition, and restructuring of work�owactivities,” Journal of Database Management, vol. 15, no. 1, pp.1–40, 2004.

[6] C. Lin and S. Lu, “Architectures of work�ow managementsystems: a survey,” Tech. Rep. TR-SWR-01-2008, 2008.

[7] D. Georgakopoulos, M. Hornick, and A. Sheth, “An overviewof work�ow management: from process modeling to work�owautomation infrastructure,” Distributed and Parallel Databases,vol. 3, no. 2, pp. 119–153, 1995.

[8] S. Bowers, B. Ludaescher, A. Ngu, and T. Critchlow, “Enablingscienti�c work�ow reuse through structured composition ofdata�ow and control-�ow,” in Proceedings of the 22nd Interna-tional Conference on Data EngineeringWorkshops (ICDEW ’06),2006.

[9] S. S. Sahoo and A. Sheth, “Provenir ontology: towards a frame-work for eScience provenance management,” in Proceedings ofthe Microso eScience Workshop, Pittsburgh, Pa, USA, October2009.

[10] G. Bell, T. Hey, and A. Szalay, “Beyond the data deluge,” Science,vol. 323, no. 5919, pp. 1297–1298, 2009.

[11] E. Deelman, D. Gannon, M. Shields, and I. Taylor, “Work�owsand e-Science: an overview of work�ow system features and

Page 13: Untitled-17 [cobweb.cs.uga.edu]cobweb.cs.uga.edu/~kochut/teaching/8350/Papers/...4 ISRNSo wareEngineering duringthework owexecutionthattypicallyisdoneon large-scaledistributedorparallelinfrastructuressuchasHPC

ISRN Soware Engineering 13

capabilities,” Future Generation Computer Systems, vol. 25, no.5, pp. 528–540, 2009.

[12] M. Sonntag, D. Karastoyanova, and E. Deelman, “Bridgingthe gap between business and scienti�c work�ows: humansin the loop of scienti�c work�ows,” in Proceedings of the 6thIEEE International Conference on e-Science (eScience ’10), pp.206–213, December 2010.

[13] J. Yu and R. Buyya, “A taxonomy of scienti�c work�ow systemsfor grid computing,” SIGMOD Record, vol. 34, no. 3, pp. 44–49,2005.

[14] E. Al-Shakarchi, P. Cozza, A. Harrison et al., “Distributingwork�ows over a ubi�uitous P2P network,” Scienti�c Program-ming, vol. 15, no. 4, pp. 269–281, 2007.

[15] K. Ostrowski, K. Birman, and D. Dolev, “Extensible architec-ture for high-performance, scalable, reliable publish-subscribeeventing and noti�cation,” International Journal ofWeb ServicesResearch, vol. 4, no. 4, pp. 18–58, 2007.

[16] A. Lathers, M. H. Su, A. Kulungowski et al., “Enabling parallelscienti�c applications with work�ow tools,” in Proceedings ofthe Challenges of Large Applications in Distributed Environments(CLADE ’06), pp. 55–60, June 2006.

[17] I. J. Taylor, E. Deelman, D. B. Gannon, and M. Shields,Work�ows for e-Science� Scienti�cWork�ows for Grids, Springer,London, UK, 2007.

[18] M.Mirto,M. Passante, andG. Aloisio, “e ProGenGrid virtuallaboratory for bioinformatics,” in Proceedings of the IEEE 25thInternational Symposium on Computer-Based Medical Systems(CBMS ’12), 2012.

[19] M. Cannataro, A. Guzzo, C. Comito, and P. Veltri, “Ontology-based design of bioinformatics work�ows on PROTEUS,” Jour-nal of Digital Information Management, vol. 2, no. 1, pp. 87–92,2004.

[20] W.M. P. Van der Aalst, A. H. M. Ter Hofstede, B. Kiepuszewski,and A. P. Barros, “Work�ow patterns,” Distributed and ParallelDatabases, vol. 14, no. 1, pp. 5–51, 2003.

[21] Condor Team, DAGMan: A Directed Acyclic Graph Manager,July 2005, http://research.cs.wisc.edu/htcondor/dagman/.

[22] M. Litzkow, M. Livny, and M. Mutka, “Condor—a hunterof idle workstations,” in Proceedings of the 8th InternationalConference on Distributed Computing Systems, pp. 104–111,IEEE Computer Society, New York, NY, USA, 1988.

[23] P. Couvares, T. Kosar, A. Roy, J. Weber, and K. Wenger,“Work�owmanagement in Condor,” inWork�ows for e-Science,pp. 357–375, Springer, New York, NY, USA, 2007.

[24] M. Mirto, M. Passante, and G. Aloisio, “A grid meta schedulerfor a distributed interoperable work�ow management system,”in Proceedings of the IEEE 23rd International Symposium onComputer-Based Medical Systems (CBMS ’10), pp. 138–143,IEEE Computer Society, Washington, DC, USA, 2010.

[25] L. Zhang, J. Zhang, and H. Cai, Services Computing, Springer,2007.

[26] T. Erl, Service-Oriented Architecture Concepts, Technology andDesign, Pearson Education, 2005.

[27] M. Vouk and M. Singh, “�uality of service and scienti�cwork�ows,” in Proceedings of theWorking Conference on Qualityof Numerical Soware, pp. 77–89, 1996.

[28] A. Arsanjani, L. J. Zhang, M. Ellis, A. Allam, and K.Channabasavaiah, “S3: a service-oriented reference architec-ture,” IT Professional, vol. 9, no. 3, pp. 10–17, 2007.

[29] C. Lin, S. Lu, Z. Lai et al., “Service-oriented architecturefor VIEW: a visual scienti�c work�ow management system,”

in Proceedings of IEEE International Conference on ServicesComputing (SCC ’08), pp. 335–342, 2008.

[30] Y. Gil, V. Ratnakar, E. Deelman, G. Mehta, and J. Kim,“Wings for Pegasus: creating large scale scienti�c applicationsusing semantic representations of computational work�ows,”in Proceedings of the 19th Innovative Applications of Arti�cialIntelligence Conference (IAAI ’07), 2007.

[31] G. Juve and E. Deelman, “Resource provisioning options forlarge-scale scienti�c work�ows,” in Proceedings of the 3rdInternational Workshop on Scienti�c Work�ows and BusinessWork�ow Standards in e-Science, Indianapolis, Ind, USA, 2008.

[32] A. Tsalgatidou, G. Athanasopoulos, M. Pantazoglou et al.,“Developing scienti�c work�ows from heterogeneous services,”SIGMOD Record, vol. 35, no. 2, pp. 19–25, 2006.

[33] J. A. Miller, D. Palaniswami, A. P. Sheth, K. J. Kochut, and H.Singh, “WebWork: METEOR2�s web-based work�ow manage-ment system,” Journal of Intelligent Information Systems, vol. 10,no. 2, pp. 185–213, 1998.

[34] G. Alonso, R. Günthör, M. Kamath, D. Agrawal, A. Abbadi, andC. Mohan, “Exotica/FMDC: a work�ow management systemfor mobile and disconnected clients,” Distributed and ParallelDatabases, vol. 4, no. 3, pp. 229–247, 1996.

[35] F. Leymann and D. Roller, “Business process managementwith FlowMark,” in Proceedings of the IEEE Computer SocietyInternational Conference (COMPCON ’94), pp. 230–234, 1994.

[36] B. Lud�scher, I. Altintas, C. Berkley et al., “Scienti�c work�owmanagement and theKepler system,”Concurrency ComputationPractice and Experience, vol. 18, no. 10, pp. 1039–1065, 2006.

[37] S. P. Callahan, J. Freire, E. Santos, C. E. Scheidegger, C. T. Silva,and H. T. Vo, “VisTrails: visualization meets data management,”in Proceedings of the ACM International Conference onManage-ment of Data (SIGMOD ’06), pp. 745–747, June 2006.

[38] Y. Zhao, M. Hategan, B. Clifford et al., “Swi: fast, reliable,loosely coupled parallel computation,” inProceedings of Interna-tionalWorkshop on Scienti�cWork�ows (SWF ’07), pp. 199–206,July 2007.

[39] S. Majithia, M. Shields, I. Taylor, and I. Wang, “Triana: agraphical web service composition and execution toolkit,” inProceedings of IEEE International Conference on Web Services(ICWS ’04), pp. 514–521, July 2004.

[40] A. Chebotko, C. Lin, �. Fei et al., “VIEW: a visual scienti�cwork�ow management system,” in Proceedings of InternationalWorkshop on Scienti�c Work�ows (SWF ’07), pp. 207–208, July2007.

[41] P. Kacsuk and G. Sipos, “Multi-Grid, multi-user work�ows inthe P-GRADE Grid portal,” Journal of Grid Computing, vol. 3,no. 3-4, pp. 221–238, 2005.

[42] S. Callahan, J. Freire, E. Santos, C. Scheidegger, C. Silva, andH. Vo, “Managing the evolution of data�ows with visTrails,” inProceedings of IEEE Workshop on Work�ow and Data Flow forScienti�c Applications, SciFlow, 2006.

[43] T. Glatard, G. Sipos, J. Montagnat, Z. Farkas, and P. Kacsuk,“Work�ow-level parametric study support by MOTEUR andthe P-GRADE portal,” inWork�ows for e-Science, pp. 279–299,Springer, New York, NY, USA, 2007.

[44] Web Services Business Process Execution Language,Version 2.0, Primer, May 2007, https://www.oasis-open.org/committees/download.php/23964/wsbpel-v2.0-primer.htm.

[45] T. Fletcher, C. Ltd, P. Furniss, A. Green, and R. Haugen, “BPELand Business Transaction Management: Choreology Submis-sion to OASIS WS-BPELTechnical Committee,” Published onWeb.

Page 14: Untitled-17 [cobweb.cs.uga.edu]cobweb.cs.uga.edu/~kochut/teaching/8350/Papers/...4 ISRNSo wareEngineering duringthework owexecutionthattypicallyisdoneon large-scaledistributedorparallelinfrastructuressuchasHPC

14 ISRN Soware Engineering

[46] W. Emmerich, B. Butchart, L. Chen, B. Wassermann, and S. L.Price, “Grid service orchestration using the Business ProcessExecution Language (BPEL),” Journal of Grid Computing, vol.3, no. 3-4, pp. 283–304, 2005.

[47] M. B. Juric, Business Process Execution Language for WebServices BPEL and BPEL4WS, Packt Publishing, 2nd edition,2006.

[48] M. Alt, A. Hoheisel, H.-W. Pohl, and S. Gorlatch, “A gridwork�ow language using high-level petri nets,” in Proceedingsof the 6th International Conference on Parallel Processing andAppliedMathematics (PPAM ’05), R.Wyrzykowski, J. Dongarra,N. Meyer, and J. Wasniewski, Eds., vol. 3911 of Lecture Notes inComputer Science, pp. 715–722, Springer, New York, NY, USA,2006.

[49] Z. Guan, F. Hern, and P. Bangalore, Grid-�ow: a grid-enabledscienti�c work�ow system with a petri net-based interface �Ph�D�thesis], University of Alabama at Birmingham, 2006.

[50] M. Atay, A. Chebotko, D. Liu, S. Lu, and F. Fotouhi, “Efficientschema-based XML-to-Relational data mapping,” InformationSystems, vol. 32, no. 3, pp. 458–476, 2007.

[51] A. Chebotko, M. Atay, S. Lu, and F. Fotouhi, “XML subtreereconstruction from relational storage of XML documents,”Data and Knowledge Engineering, vol. 62, no. 2, pp. 199–218,2007.

[52] T. Andrews, F. Curbera, H. Dholakia et al., “Business processexecution language for web services,” version 1.1, 2003.

[53] F. Neubauer, A. Hoheisel, and J. Geiler, “Work�ow-based Gridapplications,” Future Generation Computer Systems, vol. 22, no.1-2, pp. 6–15, 2006.

[54] G. von Laszewski and M. Hategan, “Java CoG Kit Kara-�an�Gridant work�ow guide,” Tech. Rep., Argonne NationalLaboratory, Argonne, Ill, USA, 2005.

[55] G. von Laszewski, M. Hategan, and D. Kodeboyina, “Java CoGkit work�ow,” inWork�ows for e-Science, pp. 143–166, Springer,New York, 2007.

[56] T. Oinn, M. Addis, J. Ferris et al., “Taverna: a tool forthe composition and enactment of bioinformatics work�ows,”Bioinformatics, vol. 20, no. 17, pp. 3045–3054, 2004.

[57] T. Oinn, M. Greenwood, M. Addis et al., “Taverna: lessonsin creating a work�ow environment for the life sciences,”Concurrency Computation Practice and Experience, vol. 18, no.10, pp. 1067–1100, 2006.

[58] I. Taylor,M. Shields, I.Wang, andO. Rana, “Triana, applicationswithin Grid computing and peer to peer environments,” Journalof Grid Computing, vol. 1, pp. 199–217, 2004.

[59] I. Taylor, M. Shields, I. Wang, and A. Harrison, “Visual gridwork�ow in Triana,” Journal of Grid Computing, vol. 3, no. 3-4, pp. 153–169, 2005.

[60] I. Taylor, E. Al-Shakarchi, and S. D. Beck, “Distributed audioretrieval using Triana, DART,” in Proceedings of the Interna-tional Computer Music Conference (ICMC ’06), pp. 716–722,New Orleans, Lo, USA, November 2006.

[61] E. Deelman, G. Singh, M. H. Su et al., “Pegasus: a frameworkfor mapping complex scienti�c work�ows onto distributedsystems,” Scienti�c Programming, vol. 13, no. 3, pp. 219–237,2005.

[62] A. Nagavaram, G. Agrawal, M. Freitas, G. Mehta, R. Mayani,and E. Deelman, “A cloud-based dynamic work�ow for massspectrometry data analysis,” in Proceedings of the 7th IEEEInternational Conference on e-Science (e-Science ’11), December2011.

[63] J. S. Vöckler, G. Juve, E. Deelman, M. Rynge, and B. Berriman,“Experiences using cloud computing for a scienti�c work�owapplication,” in Proceedings of the 2nd International Workshopon Scienti�c Cloud Computing (ScienceCloud ’11), pp. 15–24,June 2011.

[64] M. Malawski, G. Juve, E. Deelman, and J. Nabrzyski, “Cost-and deadline-constrained provisioning for scienti�c work�owwnsembles in IaaS clouds,” in Proceedings of the 24th IEEE/ACMInternational Conference on Supercomputing (SC ’12), 2012.

[65] G. Juve, E. Deelman, B. Berriman, B. P. Berman, and P. Maech-ling, “An evaluation of the cost and performance of scienti�cwork�ows on Amazon EC2,” Journal of Grid Computing, vol.10, no. 1, pp. 5–21, 2012.

[66] I. Altintas, C. Berkley, E. Jaeger, M. Jones, B. Ludäscher, and S.Mock, “Kepler: an extensible system for design and executionof scienti�c work�ows,” in Proceedings of the 16th InternationalConference on Scienti�c and Statistical Database Management(SSDBM ’04), pp. 423–424, IEEE Computer Society, New York,NY, USA, 2004.

[67] J. H. Porter, P. C. Hanson, and C.-C. Lin, “Staying a�oat in thesensor data deluge,” Trends in Ecology and Evolution, vol. 27, no.2, pp. 121–129, 2012.

[68] T. Fahringer, R. Prodan, R. Duan et al., “ASKALON: a gridapplication development and computing environment,” in Pro-ceedings of the 6th IEEE/ACM International Workshop on GridComputing, pp. 122–131, November 2005.

[69] T. Fahringer, �. Jun, and S. Hainzer, “Speci�cation of Gridwork�ow applications with AGWL: an abstract Grid work�owlanguage,” in Proceedings of IEEE International Symposium onCluster Computing and the Grid (CCGrid ’05), pp. 676–685,May2005.

[70] A. Congiusta, G. Greco, A. Guzzo et al., “A data mining-basedframework for grid work�ow management,” in Proceedings ofthe 5th International Conference on Quality Soware (QSIC ’05),pp. 349–356, IEEE Computer Society Press, September 2005.

[71] D. Talia, P. Trun�o, and O. Verta, “Weka4WS: a WSRF-enabled Weka toolkit for distributed data mining on Grids,”in Proceedings of the 9th European Conference on Principlesand Practice of Knowledge Discovery in Databases, pp. 309–320,Porto, Portugal, 2005.

[72] M. Lackovic, D. Talia, and P. Trun�o, “A framework for com-posing knowledge discovery work�ows in grids,” in Founda-tions of Computational Intelligence, A. Abraham, A. Hassanien,A. Carvalho, and V. Snášel, Eds., vol. 6 of Data Mining eo-retical Foundations and Applications, Studies in ComputationalIntelligence, pp. 345–369, Springer, 2009.

[73] D. Talia, P. Trun�o, and O. Verta, “�e Weka4WS frameworkfor distributed data mining in service-oriented Grids,” Concur-rency Computation Practice and Experience, vol. 20, no. 16, pp.1933–1951, 2008.

[74] A. Hoheisel, “User tools and languages for graph-based Gridwork�ows,” Concurrency Computation Practice and Experience,vol. 18, no. 10, pp. 1101–1113, 2006.

[75] K. Jensen and G. Rozenberg, High-Level Petri Nets: eory andApplication, Springer, London, UK, 1991.

[76] R. Tolosana-Calasanz, J. A. Banares, O. F. Rana, P. Á lvarez,J. Ezpeleta, and A. Hoheisel, “Adaptive exception handling forscienti�c work�ows,” Concurrency Computation Practice andExperience, vol. 22, no. 5, pp. 617–642, 2010.

[77] R. Tolosana-Calasanz, J. A. Bañares, P. Álvarez, J. Ezpeleta,and O. Rana, “An uncoordinated asynchronous checkpointing

Page 15: Untitled-17 [cobweb.cs.uga.edu]cobweb.cs.uga.edu/~kochut/teaching/8350/Papers/...4 ISRNSo wareEngineering duringthework owexecutionthattypicallyisdoneon large-scaledistributedorparallelinfrastructuressuchasHPC

ISRN Soware Engineering 15

model for hierarchical scienti�c work�ows,” Journal of Com-puter and System Sciences, vol. 76, no. 6, pp. 403–415, 2010.

[78] R. Valk, “Petri nets as token objects: an introduction toelementary object nets,” in Proceedings of the 19th InternationalConference on Application and eory of Petri Nets (ICATPN’98), J. Desel and M. Silva, Eds., vol. 1420 of Lecture Notes inComputer Science, pp. 1–25, Springer, Lisbon, Portugal, June1998.

[79] D. Gelernter, “Generative communication in Linda,” ACMTransactions on Programming Languages and Systems, vol. 7, no.1, pp. 80–112, 1985.

[80] M. Feller, I. Foster, and S. Martin, “GT4 GRAM: a functionalityand performance study,” in Proceedings of the TERAGRIDConference, Madison, Wis, USA, 2007.

[81] D. Talia and P. Trun�o, “How distributed data mining tasks canthrive as knowledge services,”Communications of the ACM, vol.53, no. 7, pp. 132–137, 2010.

[82] E. Cesario, M. Lackovic, D. Talia, and P. Trun�o, “Pro-gramming knowledgediscovery work�ows in service-orienteddistributed systems,” Concurrency and Computation: Practiceand Experience. In press.

[83] E. Cesario, M. Lackovic, D. Talia, and P. Trun�o, “Service-oriented data analysis in distributed computing systems,” inHigh Performance Computing: From Grids and Clouds to Exas-cale, I. Foster,W. Gentzsch, L. Grandinetti, and G. Joubert, Eds.,pp. 225–245, IOS Press, Lansdale, Pa, USA, 2011.

[84] A. Chebotko, X. Fei, C. Lin, S. Lu, and F. Fotouhi, “Storingand �uerying scienti�c work�ow provenance metadata usingan RDBMS,” in Proceedings of the 3rd IEEE InternationalConference on E-Science and Grid Computing, pp. 611–618,December 2007.

[85] I. Altintas, O. Barney, and E. Jaeger-Frank, “Provenance col-lection support in the kepler scienti�c work�ow system,” inProceedings of the International Provenance and AnnotationWorkshop (IPAW ’06), pp. 118–132, 2006.

[86] E. Deelman and A. Chervenak, “Data management challengesof data-intensive scienti�c work�ows,” in Proceedings of the 8thIEEE International Symposium on Cluster Computing and theGrid (CCGRID ’08), pp. 687–692, May 2008.

[87] Open Provenance Model, 2009, http://twiki.ipaw.info/bin/view/OPM/WebHome.

[88] J. Yu and R. Buyya, “Scheduling scienti�c work�ow applicationswith deadline and budget constraints using genetic algorithms,”Scienti�c Programming, vol. 14, no. 3-4, pp. 217–230, 2006.

[89] C. Anderson, “e end of theory: the data deluge makes thescienti�c method obsolete,”Wired, vol. 16, no. 7, 2008.

[90] G. Erbach, “Data-centric view in e-Science information sys-tems,” Data Science Journal, vol. 5, pp. 219–222, 2006.

[91] Y. Gil, E. Deelman, M. Ellisman et al., “Examining the chal-lenges of scienti�c work�ows,” Computer, vol. 40, no. 12, pp.24–32, 2007.

[92] D. Talia and P. Trun�o, Service-Oriented Distributed KnowledgeDiscovery, CRC Press, Boca Raton, Fla, USA, 2012.

[93] R. Bose and J. Frew, “Lineage retrieval for scienti�c datapro-cessing: a survey,” ACM Computing Surveys, vol. 37, no. 1, pp.1–28, 2005.

[94] L. Moreau, Concurrency and Computation: Practice and Experi-ence, Special Issue on the First Provenance Challenge, 2008.

[95] C. Lin, S. Lu, X. Fei et al., “A reference architecture for Scienti�cwork�ow management systems and the VIEW SOA solution,”IEEE Transactions on Services Computing, vol. 2, no. 1, pp.79–92, 2009.

[96] Y. Han, A. Sheth, and C. Bussler, “A taxonomy of adaptivework�ow management,” in Proceedings of the Conference onComputer-Supported Cooperative Work (CSCW ’98), Seattle,Wash, USA, 1998.

[97] T. Hey, S. Tansley, and K. Tolle, e Fourth Paradigm: Data-Intensive Scienti�c Discovery, Microso Research, Redmond,Wash, USA, 2009.

[98] M. Lackovic, D. Talia, R. Tolosana-Calasanz, J. A. Bañares, andO. F. Rana, “A taxonomy for the analysis of scienti�c work-�ow faults,” in Proceedings of the 2nd International Workshopon Work�ow Management in Service and Cloud Computing(WMSC ’10), in conjunction with (CSE ’10), pp. 398–403,December 2010.

[99] L. Ramakrishnan, D. Nurmi, A. Mandal et al., “VGrADS:enabling e-Science work�ows on grids and clouds with faulttolerance,” in Proceedings of the ACM/IEEE International Con-ference on High Performance Computing and Communication(SC ’09), November 2009.

[100] T. Samak, D. Gunter, M. Goode, E. Deelman, G. Juve, andF. Silva, “Failure analysis of distributed scienti�c work�owsexecuting in the cloud,” in Proceedings of the 8th InternationalConference on Network and Service Management (CNSM ’12),2012.

[101] M. Kamath and K. Ramamritham, “Failure handling andcoordinated execution of concurrent work�ows,” in Proceedingsof the 14th International Conference on Data Engineering (ICDE’98), pp. 334–341, IEEE Computer Society Washington, Febru-ary 1998.

[102] R. Tolosana-Calasanz, M. Lackovic, O. Rana, J. Banares, andD. Talia, “Characterizing �uality of resilience in scienti�cwork�ows,” in Proceedings of the 6th Workshop on Work�ows inSupport of Large-Scale Science, pp. 117–126, ACM, New York,NY, USA, November 2011.

Page 16: Untitled-17 [cobweb.cs.uga.edu]cobweb.cs.uga.edu/~kochut/teaching/8350/Papers/...4 ISRNSo wareEngineering duringthework owexecutionthattypicallyisdoneon large-scaledistributedorparallelinfrastructuressuchasHPC

Submit your manuscripts athttp://www.hindawi.com

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttp://www.hindawi.com

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation http://www.hindawi.com Volume 2014

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Applied Computational Intelligence and Soft Computing

 Advances in 

Artificial Intelligence

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Hindawi Publishing Corporation

http://www.hindawi.com Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

ComputationalIntelligence &Neuroscience

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Modelling & Simulation in EngineeringHindawi Publishing Corporation http://www.hindawi.com Volume 2014

The Scientific World JournalHindawi Publishing Corporation http://www.hindawi.com Volume 2014

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014